Rolling Mean on pandas on a specific column

PythonPython 3.xPandasDataframe

Python Problem Overview


I have a data frame like this which is imported from a CSV.

              stock  pop
Date
2016-01-04  325.316   82
2016-01-11  320.036   83
2016-01-18  299.169   79
2016-01-25  296.579   84
2016-02-01  295.334   82
2016-02-08  309.777   81
2016-02-15  317.397   75
2016-02-22  328.005   80
2016-02-29  315.504   81
2016-03-07  328.802   81
2016-03-14  339.559   86
2016-03-21  352.160   82
2016-03-28  348.773   84
2016-04-04  346.482   83
2016-04-11  346.980   80
2016-04-18  357.140   75
2016-04-25  357.439   77
2016-05-02  356.443   78
2016-05-09  365.158   78
2016-05-16  352.160   72
2016-05-23  344.540   74
2016-05-30  354.998   81
2016-06-06  347.428   77
2016-06-13  341.053   78
2016-06-20  363.515   80
2016-06-27  349.669   80
2016-07-04  371.583   82
2016-07-11  358.335   81
2016-07-18  362.021   79
2016-07-25  368.844   77
...             ...  ...

I wanted to add a new column MA which calculates Rolling mean for the column pop. I tried the following

df['MA']=data.rolling(5,on='pop').mean()

I get an error

ValueError: Wrong number of items passed 2, placement implies 1

So I thought let me try if it just works without adding a column. I used

 data.rolling(5,on='pop').mean()

I got the output

               stock  pop
Date
2016-01-04       NaN   82
2016-01-11       NaN   83
2016-01-18       NaN   79
2016-01-25       NaN   84
2016-02-01  307.2868   82
2016-02-08  304.1790   81
2016-02-15  303.6512   75
2016-02-22  309.4184   80
2016-02-29  313.2034   81
2016-03-07  319.8970   81
2016-03-14  325.8534   86
2016-03-21  332.8060   82
2016-03-28  336.9596   84
2016-04-04  343.1552   83
2016-04-11  346.7908   80
2016-04-18  350.3070   75
2016-04-25  351.3628   77
2016-05-02  352.8968   78
2016-05-09  356.6320   78
2016-05-16  357.6680   72
2016-05-23  355.1480   74
2016-05-30  354.6598   81
2016-06-06  352.8568   77
2016-06-13  348.0358   78
2016-06-20  350.3068   80
2016-06-27  351.3326   80
2016-07-04  354.6496   82
2016-07-11  356.8310   81
2016-07-18  361.0246   79
2016-07-25  362.0904   77
...              ...  ...

I can't seem to apply Rolling mean on the column pop. What am I doing wrong?

Python Solutions


Solution 1 - Python

To assign a column, you can create a rolling object based on your Series:

df['new_col'] = data['column'].rolling(5).mean()

The answer posted by ac2001 is not the most performant way of doing this. He is calculating a rolling mean on every column in the dataframe, then he is assigning the "ma" column using the "pop" column. The first method of the following is much more efficient:

%timeit df['ma'] = data['pop'].rolling(5).mean()
%timeit df['ma_2'] = data.rolling(5).mean()['pop']

1000 loops, best of 3: 497 µs per loop
100 loops, best of 3: 2.6 ms per loop

I would not recommend using the second method unless you need to store computed rolling means on all other columns.

Solution 2 - Python

Edit: pd.rolling_mean is deprecated in pandas and will be removed in future. Instead: Using pd.rolling you can do:

df['MA'] = df['pop'].rolling(window=5,center=False).mean()

for a dataframe df:

          Date    stock  pop
0   2016-01-04  325.316   82
1   2016-01-11  320.036   83
2   2016-01-18  299.169   79
3   2016-01-25  296.579   84
4   2016-02-01  295.334   82
5   2016-02-08  309.777   81
6   2016-02-15  317.397   75
7   2016-02-22  328.005   80
8   2016-02-29  315.504   81
9   2016-03-07  328.802   81

To get:

          Date    stock  pop    MA
0   2016-01-04  325.316   82   NaN
1   2016-01-11  320.036   83   NaN
2   2016-01-18  299.169   79   NaN
3   2016-01-25  296.579   84   NaN
4   2016-02-01  295.334   82  82.0
5   2016-02-08  309.777   81  81.8
6   2016-02-15  317.397   75  80.2
7   2016-02-22  328.005   80  80.4
8   2016-02-29  315.504   81  79.8
9   2016-03-07  328.802   81  79.6

Documentation: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.rolling.html

Old: Although it is deprecated you can use:

df['MA']=pd.rolling_mean(df['pop'], window=5)

to get:

          Date    stock  pop    MA
0   2016-01-04  325.316   82   NaN
1   2016-01-11  320.036   83   NaN
2   2016-01-18  299.169   79   NaN
3   2016-01-25  296.579   84   NaN
4   2016-02-01  295.334   82  82.0
5   2016-02-08  309.777   81  81.8
6   2016-02-15  317.397   75  80.2
7   2016-02-22  328.005   80  80.4
8   2016-02-29  315.504   81  79.8
9   2016-03-07  328.802   81  79.6

Documentation: http://pandas.pydata.org/pandas-docs/version/0.17.0/generated/pandas.rolling_mean.html

Solution 3 - Python

This solution worked for me.

data['MA'] = data.rolling(5).mean()['pop']

I think the issue may be that the on='pop' is just changing the column to perform the rolling window from the index.

From the doc string: " For a DataFrame, column on which to calculate the rolling window, rather than the index"

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionAnti21View Question on Stackoverflow
Solution 1 - PythonAndrew LView Answer on Stackoverflow
Solution 2 - PythonChuckView Answer on Stackoverflow
Solution 3 - Pythonac2001View Answer on Stackoverflow