Pandas: drop a level from a multi-level column index?

PythonPandas

Python Problem Overview


If I've got a multi-level column index:

>>> cols = pd.MultiIndex.from_tuples([("a", "b"), ("a", "c")])
>>> pd.DataFrame([[1,2], [3,4]], columns=cols)

a
---+--
b | c
--+---+--
0 | 1 | 2
1 | 3 | 4

How can I drop the "a" level of that index, so I end up with:

b | c
--+---+--
0 | 1 | 2
1 | 3 | 4

Python Solutions


Solution 1 - Python

You can use MultiIndex.droplevel:

>>> cols = pd.MultiIndex.from_tuples([("a", "b"), ("a", "c")])
>>> df = pd.DataFrame([[1,2], [3,4]], columns=cols)
>>> df
   a   
   b  c
0  1  2
1  3  4

[2 rows x 2 columns]
>>> df.columns = df.columns.droplevel()
>>> df
   b  c
0  1  2
1  3  4

[2 rows x 2 columns]

Solution 2 - Python

As of Pandas 0.24.0, we can now use DataFrame.droplevel():

cols = pd.MultiIndex.from_tuples([("a", "b"), ("a", "c")])
df = pd.DataFrame([[1,2], [3,4]], columns=cols)

df.droplevel(0, axis=1) 

#   b  c
#0  1  2
#1  3  4

This is very useful if you want to keep your DataFrame method-chain rolling.

Solution 3 - Python

Another way to drop the index is to use a list comprehension:

df.columns = [col[1] for col in df.columns]

   b  c
0  1  2
1  3  4

This strategy is also useful if you want to combine the names from both levels like in the example below where the bottom level contains two 'y's:

cols = pd.MultiIndex.from_tuples([("A", "x"), ("A", "y"), ("B", "y")])
df = pd.DataFrame([[1,2, 8 ], [3,4, 9]], columns=cols)

   A     B
   x  y  y
0  1  2  8
1  3  4  9

Dropping the top level would leave two columns with the index 'y'. That can be avoided by joining the names with the list comprehension.

df.columns = ['_'.join(col) for col in df.columns]

	A_x	A_y	B_y
0	1	2	8
1	3	4	9

That's a problem I had after doing a groupby and it took a while to find this other question that solved it. I adapted that solution to the specific case here.

Solution 4 - Python

Another way to do this is to reassign df based on a cross section of df, using the .xs method.

>>> df

	a
    b	c
0	1	2
1	3	4

>>> df = df.xs('a', axis=1, drop_level=True)

    # 'a' : key on which to get cross section
    # axis=1 : get cross section of column
    # drop_level=True : returns cross section without the multilevel index

>>> df

    b	c
0	1	2
1	3	4

Solution 5 - Python

You could also achieve that by renaming the columns:

df.columns = ['a', 'b']

This involves a manual step but could be an option especially if you would eventually rename your data frame.

Solution 6 - Python

A small trick using sum with level=1(work when level=1 is all unique)

df.sum(level=1,axis=1)
Out[202]: 
   b  c
0  1  2
1  3  4

More common solution get_level_values

df.columns=df.columns.get_level_values(1)
df
Out[206]: 
   b  c
0  1  2
1  3  4

Solution 7 - Python

I have struggled with this problem since I don’t know why my droplevel() function does not work. Work through several and learn that ‘a’ in your table is columns name and ‘b’, ‘c’ are index. Do like this will help

df.columns.name = None
df.reset_index() #make index become label

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionDavid WoleverView Question on Stackoverflow
Solution 1 - PythonDSMView Answer on Stackoverflow
Solution 2 - PythonjxcView Answer on Stackoverflow
Solution 3 - PythonMintView Answer on Stackoverflow
Solution 4 - PythonspacetyperView Answer on Stackoverflow
Solution 5 - PythonsedehView Answer on Stackoverflow
Solution 6 - PythonBENYView Answer on Stackoverflow
Solution 7 - PythondhFrankView Answer on Stackoverflow