Pandas equivalent of Oracle Lead/Lag function

Python Problem Overview

First I'm new to pandas, but I'm already falling in love with it. I'm trying to implement the equivalent of the Lag function from Oracle.

Let's suppose you have this DataFrame:

Date                   Group      Data
2014-05-14 09:10:00        A         1
2014-05-14 09:20:00        A         2
2014-05-14 09:30:00        A         3
2014-05-14 09:40:00        A         4
2014-05-14 09:50:00        A         5
2014-05-14 10:00:00        B         1
2014-05-14 10:10:00        B         2
2014-05-14 10:20:00        B         3
2014-05-14 10:30:00        B         4

If this was an oracle database and I wanted to create a lag function grouped by the "Group" column and ordered by the Date I could easily use this function:

 LAG(Data,1,NULL) OVER (PARTITION BY Group ORDER BY Date ASC) AS Data_lagged

This would result in the following Table:

Date                   Group     Data    Data lagged
2014-05-14 09:10:00        A        1           Null
2014-05-14 09:20:00        A        2            1
2014-05-14 09:30:00        A        3            2
2014-05-14 09:40:00        A        4            3
2014-05-14 09:50:00        A        5            4
2014-05-14 10:00:00        B        1           Null
2014-05-14 10:10:00        B        2            1
2014-05-14 10:20:00        B        3            2
2014-05-14 10:30:00        B        4            3

In pandas I can set the date to be an index and use the shift method:

db["Data_lagged"] = db.Data.shift(1)

The only issue is that this doesn't group by a column. Even if I set the two columns Date and Group as indexes, I would still get the "5" in the lagged column.

Is there a way to implement the equivalent of the Lead and lag functions in Pandas?

Python Solutions

Solution 1 - Python

You could perform a groupby/apply (shift) operation:

In [15]: df['Data_lagged'] = df.groupby(['Group'])['Data'].shift(1)

In [16]: df
Out[16]: 
                Date Group  Data  Data_lagged
2014-05-14  09:10:00     A     1          NaN
2014-05-14  09:20:00     A     2            1
2014-05-14  09:30:00     A     3            2
2014-05-14  09:40:00     A     4            3
2014-05-14  09:50:00     A     5            4
2014-05-14  10:00:00     B     1          NaN
2014-05-14  10:10:00     B     2            1
2014-05-14  10:20:00     B     3            2
2014-05-14  10:30:00     B     4            3

[9 rows x 4 columns]

To obtain the ORDER BY Date ASC effect, you must sort the DataFrame first:

df['Data_lagged'] = (df.sort_values(by=['Date'], ascending=True)
                       .groupby(['Group'])['Data'].shift(1))

Solution 2 - Python

For lead operation in pandas, one need to just use shift(-1) instead of 1

df['Data_lead'] = df.groupby(['Group'])['Data'].shift(-1)

Content Type	Original Author	Original Content on Stackoverflow
Question	gcarmiol	View Question on Stackoverflow
Solution 1 - Python	unutbu	View Answer on Stackoverflow
Solution 2 - Python	Rahul Mehta	View Answer on Stackoverflow

Pandas equivalent of Oracle Lead/Lag function

Python Problem Overview

Python Solutions

Solution 1 - Python

Solution 2 - Python

how to avoid "Octal literals are not allowed in strict mode" with createWriteStream

Android Studio : unmappable character for encoding UTF-8

Attributions