How to move pandas data from index to column after multiple groupby

PythonPandasPandas GroupbyMulti Index

Python Problem Overview


I have the following pandas dataframe:

token	 year	 uses  books
  386	xanthos	 1830	 3     3
  387	xanthos	 1840	 1     1
  388	xanthos	 1840	 2     2
  389	xanthos	 1868	 2     2
  390	xanthos	 1875	 1     1

I aggregate the rows with duplicate token and years like so:

dfalph = dfalph[['token','year','uses','books']].groupby(['token', 'year']).agg([np.sum])
dfalph.columns = dfalph.columns.droplevel(1)

               uses  books
token    year		
xanthos  1830	 3     3
         1840	 3     3
         1867	 2     2
         1868	 2     2
         1875	 1     1

Instead of having the 'token' and 'year' fields in the index, I would like to return them to columns and have an integer index.

Python Solutions


Solution 1 - Python

Method #1: reset_index()

>>> g
              uses  books
               sum    sum
token   year             
xanthos 1830     3      3
        1840     3      3
        1868     2      2
        1875     1      1

[4 rows x 2 columns]
>>> g = g.reset_index()
>>> g
     token  year  uses  books
                   sum    sum
0  xanthos  1830     3      3
1  xanthos  1840     3      3
2  xanthos  1868     2      2
3  xanthos  1875     1      1

[4 rows x 4 columns]

Method #2: don't make the index in the first place, using as_index=False

>>> g = dfalph[['token', 'year', 'uses', 'books']].groupby(['token', 'year'], as_index=False).sum()
>>> g
     token  year  uses  books
0  xanthos  1830     3      3
1  xanthos  1840     3      3
2  xanthos  1868     2      2
3  xanthos  1875     1      1

[4 rows x 4 columns]

Solution 2 - Python

I defer form the accepted answer. While there are 2 ways to do this, these will not necessarily result in same output. Specially when you are using Grouper in groupby

  • index=False
  • reset_index()

example df

+---------+---------+-------------+------------+
| column1 | column2 | column_date | column_sum |
+---------+---------+-------------+------------+
| A       | M       | 26-10-2018  |          2 |
| B       | M       | 28-10-2018  |          3 |
| A       | M       | 30-10-2018  |          6 |
| B       | M       | 01-11-2018  |          3 |
| C       | N       | 03-11-2018  |          4 |
+---------+---------+-------------+------------+

They do not work the same way.

df = df.groupby(
    by=[
        'column1',
        'column2',
        pd.Grouper(key='column_date', freq='M')
    ],
    as_index=False
).sum()

The above will give

+---------+---------+------------+
| column1 | column2 | column_sum |
+---------+---------+------------+
| A       | M       |          8 |
| B       | M       |          3 |
| B       | M       |          3 |
| C       | N       |          4 |
+---------+---------+------------+

While,

df = df.groupby(
    by=[
        'column1',
        'column2',
        pd.Grouper(key='column_date', freq='M')
    ]
).sum().reset_index()

Will give

+---------+---------+-------------+------------+
| column1 | column2 | column_date | column_sum |
+---------+---------+-------------+------------+
| A       | M       | 31-10-2018  |          8 |
| B       | M       | 31-10-2018  |          3 |
| B       | M       | 30-11-2018  |          3 |
| C       | N       | 30-11-2018  |          4 |
+---------+---------+-------------+------------+

Solution 3 - Python

You need to add drop=True:

df.reset_index(drop=True)

df = df.groupby(
    by=[
        'column1',
        'column2',
        pd.Grouper(key='column_date', freq='M')
    ]
).sum().reset_index(drop=True)

Solution 4 - Python

If you have the MultiIndex and want to reset only a specific index level you can use the parameter level in reset_index. For example:

index = pd.MultiIndex.from_tuples([('one', 'a'), ('one', 'b'), ('two', 'a'), ('two', 'b')], names=['A', 'B'])
s = pd.DataFrame(np.arange(1.0, 5.0), index=index, columns=['C'])

        C
A   B     
one a  1.0
    b  2.0
two a  3.0
    b  4.0

Reset the first level:

df.reset_index(level=0)

Output:

     A    C
B          
a  one  1.0
b  one  2.0
a  two  3.0
b  two  4.0

Reset the second level:

df.reset_index(level=1)

Output:

     B    C
A          
one  a  1.0
one  b  2.0
two  a  3.0
two  b  4.0

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionprooffreaderView Question on Stackoverflow
Solution 1 - PythonDSMView Answer on Stackoverflow
Solution 2 - PythonAdarsh MadrechaView Answer on Stackoverflow
Solution 3 - Pythonuser1809802View Answer on Stackoverflow
Solution 4 - PythonMykola ZotkoView Answer on Stackoverflow