ind_list = [1, 3]
    df.ix[ind_list]

should do the trick!
When I index with data frames I always use the .ix() method. Its so much easier and more flexible...


**UPDATE**
This is no longer the accepted method for indexing. The `ix` method is  deprecated. Use `.iloc` for integer based indexing and `.loc` for label based indexing. See below example:

    ind_list = [1, 3]
    df.iloc[ind_list]



you can also use iloc:

    df.iloc[[1,3],:]

This will not work if the indexes in your dataframe do not correspond to the order of the rows due to prior computations. In that case use: 
   
    df.index.isin([1,3])

... as suggested in other responses.



Another way (although it is a longer code) but it is faster than the above codes. Check it using %timeit function:
       
    df[df.index.isin([1,3])]

PS: You figure out the reason

[![enter image description here][1]][1]

  [1]: https://i.stack.imgur.com/vTHuZ.png

If ```index_list``` contains your desired indices, you can get the dataframe with the desired rows by doing 

```
index_list = [1,2,3,4,5,6]
df.loc[df.index[index_list]]
```

This is based on the latest [documentation](https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#combining-positional-and-label-based-indexing) as of March 2021.

For large datasets, it is memory efficient to read only selected rows via the `skiprows` parameter.


**Example**

    pred = lambda x: x not in [1, 3]
    pd.read_csv(&quot;data.csv&quot;, skiprows=pred, index_col=0, names=...)

This will now return a DataFrame from a file that skips all rows except 1 and 3.

---

**Details**

From the [docs][0]:

&gt; `skiprows` : list-like or integer or callable, default `None`
&gt;
&gt; ...
&gt;
&gt; If callable, the callable function will be evaluated against the row indices, returning True if the row should be skipped and False otherwise. An example of a valid callable argument would be `lambda x: x in [0, 2]` 

This feature works in version pandas 0.20.0+.  See also the [corresponding issue][1] and a [related post][2].


  [0]: https://pandas.pydata.org/pandas-docs/version/0.21/generated/pandas.read_csv.html
  [1]: https://github.com/pandas-dev/pandas/issues/10882
  [2]: https://stackoverflow.com/questions/39677183/quickly-sampling-large-number-of-rows-from-large-dataframes-in-python/39677807#39677807

There are many ways of solving this problem, and the ones listed above are the most commonly used ways of achieving the solution. I want to add two more ways, just in case someone is looking for an alternative.

    index_list = [1,3]

    df.take(pos)

    #or

    df.query(&#39;index in @index_list&#39;)




I&#39;m trying to reproduce my Stata code in Python, and I was pointed in the direction of Pandas.  I am, however, having a hard time wrapping my head around how to process the data.

Let&#39;s say I want to iterate over all values in the column head &#39;ID.&#39; If that ID matches a specific number, then I want to change two corresponding values FirstName and LastName.

In Stata it looks like this:
		
    replace FirstName = &quot;Matt&quot; if ID==103
    replace LastName =  &quot;Jones&quot; if ID==103

So this replaces all values in FirstName that correspond with values of ID == 103 to Matt.  

In Pandas, I&#39;m trying something like this

    df = read_csv(&quot;test.csv&quot;)
    for i in df[&#39;ID&#39;]:
        if i ==103:
              ...

Not sure where to go from here.  Any ideas?


Change one value based on another value in pandas

A comparison of outputs reveals differences:

    user@user-VirtualBox:~$ pip list
    feedparser (5.1.3)
    pip (1.4.1)
    setuptools (1.1.5)
    wsgiref (0.1.2)
    user@user-VirtualBox:~$ pip freeze
    feedparser==5.1.3
    wsgiref==0.1.2

Pip&#39;s documentation states

    freeze                      Output installed packages in requirements format.
    list                        List installed packages.

but what  is &quot;requirements format,&quot; and why does `pip list` generate a more comprehensive list than `pip freeze`?


Pip freeze vs. pip list

I have a `dataframe df`:
```
20060930  10.103       NaN     10.103   7.981
20061231  15.915       NaN     15.915  12.686
20070331   3.196       NaN      3.196   2.710
20070630   7.907       NaN      7.907   6.459
```
Then I want to select rows with certain sequence numbers which indicated in a list, suppose here is [1,3], then left:
```                       
20061231  15.915       NaN     15.915  12.686
20070630   7.907       NaN      7.907   6.459
```
How or what function can do that?

Select Pandas rows based on list index

I have a <code>dataframe df</code>:
<pre><code class="hljs language-r">20060930 10.103 NaN 10.103 7.981
20061231 15.915 NaN 15.915 12.686
20070331 3.196 NaN 3.196 2.710
20070630 7.907 NaN 7.907 6.459
</code></pre>
Then I want to select rows with certain sequence numbers which indicated in a list, suppose here is [1,3], then left:
<pre><code class="hljs language-r">20061231 15.915 NaN 15.915 12.686
20070630 7.907 NaN 7.907 6.459
</code></pre>
How or what function can do that?

I&#39;m trying to find out why the use of `global` is considered to be bad practice in python (and in programming in general). Can somebody explain? Links with more info would also be appreciated.

Why are global variables evil?

I have an integer

    {% set curYear = 2013 %}

In `{% if %}` statement I have to compare it with some string. I can&#39;t set `curYear` to string at the beginning because I have to decrement it in loop.

How can I convert it?

Convert integer to string Jinja

I&#39;m pretty new in `numpy` and I am having a hard time understanding how to extract from a `np.array` a sub matrix with defined columns and rows:

    Y = np.arange(16).reshape(4,4)
If I want to extract columns/rows 0 and 3, I should have:

    [[0 3]
     [12 15]]
I tried all the reshape functions...but cannot figure out how to do this.  Any ideas?

Numpy extract submatrix

I have some points and I am trying to fit curve for this points. I know that there exist `scipy.optimize.curve_fit` function, but I do not understand documentation, i.e how to use this function.

My points: `np.array([(1, 1), (2, 4), (3, 1), (9, 3)])`

Can anybody explain how to do that?

python numpy/scipy curve fitting

I am unable to create a single table using SQLAlchemy.

I can create it by calling `Base.metadata.create_all(engine)` but as the number of table grows, this call takes a long time.

I create table classes on the fly and then populate them.

    from sqlalchemy import create_engine, Column, Integer, Sequence, String, Date, Float, BIGINT
    from sqlalchemy.ext.declarative import declarative_base
    from sqlalchemy.orm import sessionmaker

    Base = declarative_base()

    class HistoricDay():
    	
    	id = Column(Integer, Sequence(&#39;id_seq&#39;), primary_key=True)
    	#  Date, Open, High, Low, Close, Volume, Adj Close
    	date = Column(Date)
    	open = Column(Float)
    	high = Column(Float)
    	low = Column(Float)
    	close = Column(Float)
    	volume = Column(BIGINT)
    	adjClose = Column(Float)
    	
    	def __init__(self, date, open, high, low, close, volume, adjClose):
    		self.date = date
    		self.open = open
    		self.high = high
    		self.low = low
    		self.close = close
    		self.volume = volume
    		self.adjClose = adjClose

    def build_daily_history_table_repr(self):
    		return &quot;&lt;&quot;+self.__tablename__+&quot;(&#39;{}&#39;,&#39;{}&#39;,&#39;{}&#39;,&#39;{}&#39;,&#39;{}&#39;,&#39;{}&#39;,&#39;{}&#39;,&#39;{}&#39;)&gt;&quot;.format(self.id, self.date, self.open, self.high, self.low, self.close, self.volume, self.adjClose)
    		
    def build_daily_history_table(ticket):
    	classname = ticket+&quot;_HistoricDay&quot;
    	globals()[classname] = type(classname, (HistoricDay,Base), {&#39;__tablename__&#39; : ticket+&quot;_daily_history&quot;})
    	setattr(globals()[classname], &#39;__repr__&#39;, build_daily_history_table_repr)

	# Initialize the database :: Connection &amp; Metadata retrieval
	engine = create_engine(&#39;mysql+cymysql://root@localhost/gwc?charset=utf8&amp;use_unicode=0&#39;, pool_recycle=3600) # ,echo = True
	
	# SqlAlchemy :: Session setup
	Session = sessionmaker(bind=engine)

	# Create all tables that do not already exist
	Base.metadata.create_all(engine)

	# SqlAlchemy :: Starts a session
	session = Session()

    ticketList = getTicketList()

    for ticket in ticketList:
    	build_daily_history_table(ticket)
    	class_name = ticket+&quot;_HistoricDay&quot;
    	
    	meta_create_all_timer = time.time()
    	# Create all tables that do not already exist
    	# globals()[class_name](&#39;2005-07-24&#39;,0,0,0,0,0,0).create(engine)  #doesn&#39;t work
    	#(globals()[class_name]).__table__.create(engine) #doesn&#39;t work
    	# session.commit() #doesn&#39;t work
    	
    	#Base.metadata.create_all(engine) # works but gets very slow
    	print(&quot;  meta_create_all_timer {}s&quot;.format(time.time()-meta_create_all_timer))
    	
        data = getData(ticket)

    	for m_date, m_open, m_close, m_high, m_low, m_volume, m_adjClose in data:
    		entry = globals()[class_name](m_date, m_open, m_high, m_low, m_close, m_volume, m_adjClose)
    		session.add(entry)
    	
    	session.commit()

I saw in the [documentation][1] that you can do 

    engine = create_engine(&#39;sqlite:///:memory:&#39;)
    
    meta = MetaData()
    
    employees = Table(&#39;employees&#39;, meta,
        Column(&#39;employee_id&#39;, Integer, primary_key=True),
        Column(&#39;employee_name&#39;, String(60), nullable=False, key=&#39;name&#39;),
        Column(&#39;employee_dept&#39;, Integer, ForeignKey(&quot;departments.department_id&quot;))
    )
    employees.create(engine)


However, I&#39;m not able to figure out how to do the same thing as `Table` does, with `declarative_base()`.

How can I do that with classes that inherit from `declarative_base()`?


  [1]: http://docs.sqlalchemy.org/en/rel_0_8/core/metadata.html

How to create only one table with SQLAlchemy?

I have a set of data that I load into python using a pandas dataframe. What I would like to do is create a loop that will print a plot for all the elements in their own frame, not all on one. My data is in an excel file structured in this fashion:

    Index | DATE  | AMB CO 1 | AMB CO 2 |...|AMB CO_n | TOTAL
    1     | 1/1/12|  14      | 33       |...|  236    | 1600
    .     | ...   | ...      | ...      |...|  ...    | ...
    .     | ...   | ...      | ...      |...|  ...    | ...
    .     | ...   | ...      | ...      |...|  ...    | ...
    n

This is what I have for code so far:

    import pandas as pd
    import matplotlib.pyplot as plt
    ambdf = pd.read_excel(&#39;Ambulance.xlsx&#39;, 
                          sheetname=&#39;Sheet2&#39;, index_col=0, na_values=[&#39;NA&#39;])
    print type(ambdf)
    print ambdf
    print ambdf[&#39;EAS&#39;]
    
    amb_plot = plt.plot(ambdf[&#39;EAS&#39;], linewidth=2)
    plt.title(&#39;EAS Ambulance Numbers&#39;)
    plt.xlabel(&#39;Month&#39;)
    plt.ylabel(&#39;Count of Deliveries&#39;)
    print amb_plot
    
    for i in ambdf:
        print plt.plot(ambdf[i], linewidth = 2)

   
I am thinking of doing something like this:

    for i in ambdf:
        ambdf_plot = plt.plot(ambdf, linewidth = 2)

The above was not remotely what i wanted and it stems from my unfamiliarity with Pandas, MatplotLib etc, looking at some documentation though to me it looks like matplotlib is not even needed (question 2)

So A) How can I produce a plot of data for every column in my df
and B) do I need to use matplotlib or should I just use pandas to do it all?

Thank you,


Use a loop to plot n charts Python

- How do you plot a vertical line (`vlines`) in a Pandas series plot?
- I am using Pandas to plot rolling means, etc., and would like to mark important positions with a vertical line.
- Is it possible to use `vlines`, or something similar, to accomplish this?
- In this case, the x axis is `datetime`.



How do you plot a vertical line on a time series plot in Pandas?

I have a dataframe with unix times and prices in it. I want to convert the index column so that it shows in human readable dates. 

So for instance I have `date` as `1349633705` in the index column but I&#39;d want it to show as `10/07/2012` (or at least `10/07/2012 18:15`). 

For some context, here is the code I&#39;m working with and what I&#39;ve tried already:

    import json
    import urllib2
    from datetime import datetime
    response = urllib2.urlopen(&#39;http://blockchain.info/charts/market-price?&amp;format=json&#39;)
    data = json.load(response)   
    df = DataFrame(data[&#39;values&#39;])
    df.columns = [&quot;date&quot;,&quot;price&quot;]
    #convert dates 
    df.date = df.date.apply(lambda d: datetime.strptime(d, &quot;%Y-%m-%d&quot;))
    df.index = df.date   

As you can see I&#39;m using
`df.date = df.date.apply(lambda d: datetime.strptime(d, &quot;%Y-%m-%d&quot;))` here which doesn&#39;t work since I&#39;m working with integers, not strings. I think I need to use `datetime.date.fromtimestamp` but I&#39;m not quite sure how to apply this to the whole of `df.date`. 

Thanks.

Convert unix time to readable date in pandas dataframe

I am transitioning from R to Python. I just began using Pandas. I have an R code that subsets nicely:

    k1 &lt;- subset(data, Product = p.id &amp; Month &lt; mn &amp; Year == yr, select = c(Time, Product))


Now, I want to do similar stuff in Python. this is what I have got so far:

    import pandas as pd
    data = pd.read_csv(&quot;../data/monthly_prod_sales.csv&quot;)


    #first, index the dataset by Product. And, get all that matches a given &#39;p.id&#39; and time.
     data.set_index(&#39;Product&#39;)
     k = data.ix[[p.id, &#39;Time&#39;]]
    
    # then, index this subset with Time and do more subsetting..

I am beginning to feel that I am doing this the wrong way. perhaps, there is an elegant solution. Can anyone help? I need to extract month and year from the timestamp I have and do subsetting. Perhaps there is a one-liner that will accomplish all this:

    k1 &lt;- subset(data, Product = p.id &amp; Time &gt;= start_time &amp; Time &lt; end_time, select = c(Time, Product))



thanks.

 

Content Type	Original Author	Original Content on Stackoverflow
Question	user2806761	View Question on Stackoverflow
Solution 1 - Python	Woody Pride	View Answer on Stackoverflow
Solution 2 - Python	yemu	View Answer on Stackoverflow
Solution 3 - Python	Amruth Lakkavaram	View Answer on Stackoverflow
Solution 4 - Python	user42	View Answer on Stackoverflow
Solution 5 - Python	pylang	View Answer on Stackoverflow
Solution 6 - Python	Loochie	View Answer on Stackoverflow

Select Pandas rows based on list index

Python Problem Overview

Python Solutions

Solution 1 - Python

Solution 2 - Python

Solution 3 - Python

Solution 4 - Python

Solution 5 - Python

Solution 6 - Python

Pip freeze vs. pip list

Change one value based on another value in pandas

Attributions