Python - rolling functions for GroupBy object

PythonPandasPandas GroupbyRolling ComputationRolling Sum

Python Problem Overview


I have a time series object grouped of the type <pandas.core.groupby.SeriesGroupBy object at 0x03F1A9F0>. grouped.sum() gives the desired result but I cannot get rolling_sum to work with the groupby object. Is there any way to apply rolling functions to groupby objects? For example:

x = range(0, 6)
id = ['a', 'a', 'a', 'b', 'b', 'b']
df = DataFrame(zip(id, x), columns = ['id', 'x'])
df.groupby('id').sum()
id    x
a    3
b   12

However, I would like to have something like:

  id  x
0  a  0
1  a  1
2  a  3
3  b  3
4  b  7
5  b  12

Python Solutions


Solution 1 - Python

For the Googlers who come upon this old question:

Regarding @kekert's comment on @Garrett's answer to use the new

df.groupby('id')['x'].rolling(2).mean()

rather than the now-deprecated

df.groupby('id')['x'].apply(pd.rolling_mean, 2, min_periods=1)

curiously, it seems that the new .rolling().mean() approach returns a multi-indexed series, indexed by the group_by column first and then the index. Whereas, the old approach would simply return a series indexed singularly by the original df index, which perhaps makes less sense, but made it very convenient for adding that series as a new column into the original dataframe.

So I think I've figured out a solution that uses the new rolling() method and still works the same:

df.groupby('id')['x'].rolling(2).mean().reset_index(0,drop=True)

which should give you the series

0    0.0
1    0.5
2    1.5
3    3.0
4    3.5
5    4.5

which you can add as a column:

df['x'] = df.groupby('id')['x'].rolling(2).mean().reset_index(0,drop=True)

Solution 2 - Python

cumulative sum

To answer the question directly, the cumsum method would produced the desired series:

In [17]: df
Out[17]:
  id  x
0  a  0
1  a  1
2  a  2
3  b  3
4  b  4
5  b  5

In [18]: df.groupby('id').x.cumsum()
Out[18]:
0     0
1     1
2     3
3     3
4     7
5    12
Name: x, dtype: int64
pandas rolling functions per group

More generally, any rolling function can be applied to each group as follows (using the new .rolling method as commented by @kekert). Note that the return type is a multi-indexed series, which is different from previous (deprecated) pd.rolling_* methods.

In [10]: df.groupby('id')['x'].rolling(2, min_periods=1).sum()
Out[10]:
id
a   0   0.00
    1   1.00
    2   3.00
b   3   3.00
    4   7.00
    5   9.00
Name: x, dtype: float64

To apply the per-group rolling function and receive result in original dataframe order, transform should be used instead:

In [16]: df.groupby('id')['x'].transform(lambda s: s.rolling(2, min_periods=1).sum())
Out[16]:
0    0
1    1
2    3
3    3
4    7
5    9
Name: x, dtype: int64

deprecated approach

For reference, here's how the now deprecated pandas.rolling_mean behaved:

In [16]: df.groupby('id')['x'].apply(pd.rolling_mean, 2, min_periods=1)
Out[16]: 
0    0.0
1    0.5
2    1.5
3    3.0
4    3.5
5    4.5

Solution 3 - Python

Here is another way that generalizes well and uses pandas' expanding method.

It is very efficient and also works perfectly for rolling window calculations with fixed windows, such as for time series.

# Import pandas library
import pandas as pd

# Prepare columns
x = range(0, 6)
id = ['a', 'a', 'a', 'b', 'b', 'b']

# Create dataframe from columns above
df = pd.DataFrame({'id':id, 'x':x})

# Calculate rolling sum with infinite window size (i.e. all rows in group) using "expanding"
df['rolling_sum'] = df.groupby('id')['x'].transform(lambda x: x.expanding().sum())

# Output as desired by original poster
print(df)
  id  x  rolling_sum
0  a  0            0
1  a  1            1
2  a  2            3
3  b  3            3
4  b  4            7
5  b  5           12

Solution 4 - Python

I'm not sure of the mechanics, but this works. Note, the returned value is just an ndarray. I think you could apply any cumulative or "rolling" function in this manner and it should have the same result.

I have tested it with cumprod, cummax and cummin and they all returned an ndarray. I think pandas is smart enough to know that these functions return a series and so the function is applied as a transformation rather than an aggregation.

In [35]: df.groupby('id')['x'].cumsum()
Out[35]:
0     0
1     1
2     3
3     3
4     7
5    12

Edit: I found it curious that this syntax does return a Series:

In [54]: df.groupby('id')['x'].transform('cumsum')
Out[54]:
0     0
1     1
2     3
3     3
4     7
5    12
Name: x

Solution 5 - Python

If you need to reassign the grouped-rolling-function back to the original Dataframe, while keeping order and groups you can use the transform function.

df.sort_values(by='date', inplace=True)
grpd = df.groupby('group_key')
#using center=false to assign values on window's last row
df['val_rolling_7_mean'] = grpd['val'].transform(lambda x: x.rolling(7, center=False).mean())

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
Questionuser1642513View Question on Stackoverflow
Solution 1 - PythonKevin WangView Answer on Stackoverflow
Solution 2 - PythonGarrettView Answer on Stackoverflow
Solution 3 - PythonSean McCarthyView Answer on Stackoverflow
Solution 4 - PythonZelazny7View Answer on Stackoverflow
Solution 5 - Pythonyoav_aaaView Answer on Stackoverflow