python pandas dataframe slicing by date conditions

PythonDataframePandas

Python Problem Overview


I am able to read and slice pandas dataframe using python datetime objects, however I am forced to use only existing dates in index. For example, this works:

>>> data
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 252 entries, 2010-12-31 00:00:00 to 2010-04-01 00:00:00
Data columns:
Adj Close    252  non-null values
dtypes: float64(1)

>>> st = datetime.datetime(2010, 12, 31, 0, 0)
>>> en = datetime.datetime(2010, 12, 28, 0, 0)

>>> data[st:en]
            Adj Close
Date                 
2010-12-31     593.97
2010-12-30     598.86
2010-12-29     601.00
2010-12-28     598.92

However if I use a start or end date that is not present in the DF, I get python KeyError.

My Question : How do I query the dataframe object for a date range; even when the start and end dates are not present in the DataFrame. Does pandas allow for range based slicing?

I am using pandas version 0.10.1

Python Solutions


Solution 1 - Python

Use searchsorted to find the nearest times first, and then use it to slice.

In [15]: df = pd.DataFrame([1, 2, 3], index=[dt.datetime(2013, 1, 1), dt.datetime(2013, 1, 3), dt.datetime(2013, 1, 5)])

In [16]: df
Out[16]: 
            0
2013-01-01  1
2013-01-03  2
2013-01-05  3

In [22]: start = df.index.searchsorted(dt.datetime(2013, 1, 2))

In [23]: end = df.index.searchsorted(dt.datetime(2013, 1, 4))

In [24]: df.iloc[start:end]
Out[24]: 
            0
2013-01-03  2

Solution 2 - Python

Short answer: Sort your data (data.sort()) and then I think everything will work the way you are expecting.

Yes, you can slice using datetimes not present in the DataFrame. For example:

In [12]: df
Out[12]: 
                   0
2013-04-20  1.120024
2013-04-21 -0.721101
2013-04-22  0.379392
2013-04-23  0.924535
2013-04-24  0.531902
2013-04-25 -0.957936

In [13]: df['20130419':'20130422']
Out[13]: 
                   0
2013-04-20  1.120024
2013-04-21 -0.721101
2013-04-22  0.379392

As you can see, you don't even have to build datetime objects; strings work.

Because the datetimes in your index are not sequential, the behavior is weird. If we shuffle the index of my example here...

In [17]: df
Out[17]: 
                   0
2013-04-22  1.120024
2013-04-20 -0.721101
2013-04-24  0.379392
2013-04-23  0.924535
2013-04-21  0.531902
2013-04-25 -0.957936

...and take the same slice, we get a different result. It returns the first element inside the range and stops at the first element outside the range.

In [18]: df['20130419':'20130422']
Out[18]: 
                   0
2013-04-22  1.120024
2013-04-20 -0.721101
2013-04-24  0.379392

This is probably not useful behavior. If you want to select ranges of dates, would it make sense to sort it by date first?

df.sort_index()

Solution 3 - Python

You can use a simple mask to accomplish this:

date_mask = (data.index > start) & (data.index < end)
dates = data.index[date_mask]
data.ix[dates]

By the way, this works for hierarchical indexing as well. In that case data.index would be replaced with data.index.levels[0] or similar.

Solution 4 - Python

I had difficulty with other approaches but I found that the following approach worked for me:

# Set the Index to be the Date
df['Date'] = pd.to_datetime(df['Date_1'], format='%d/%m/%Y')
df.set_index('Date', inplace=True)

# Sort the Data
df = df.sort_values('Date_1')

# Slice the Data
From = '2017-05-07'
To   = '2017-06-07'
df_Z = df.loc[From:To,:]

Solution 5 - Python

target_date = pd.Timestamp('2020-05-10')

df[df['DATE'].dt.date == (dt.date(target_date.year, target_date.month, target_date.day))]

This line will let you select all the dates that matches the condition, in case you don't neet to set "Date Column" as index.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionRishabh SagarView Question on Stackoverflow
Solution 1 - PythonwaitingkuoView Answer on Stackoverflow
Solution 2 - PythonDan AllanView Answer on Stackoverflow
Solution 3 - PythonwatsonicView Answer on Stackoverflow
Solution 4 - PythonR. CoxView Answer on Stackoverflow
Solution 5 - Pythonfega_zeroView Answer on Stackoverflow