Get unique values from index column in MultiIndex

PythonPandas

Python Problem Overview


I know that I can get the unique values of a DataFrame by resetting the index but is there a way to avoid this step and get the unique values directly?

Given I have:

        C
 A B     
 0 one  3
 1 one  2
 2 two  1

I can do:

df = df.reset_index()
uniq_b = df.B.unique()
df = df.set_index(['A','B'])

Is there a way built in pandas to do this?

Python Solutions


Solution 1 - Python

One way is to use index.levels:

In [11]: df
Out[11]: 
       C
A B     
0 one  3
1 one  2
2 two  1

In [12]: df.index.levels[1]
Out[12]: Index([one, two], dtype=object)

Solution 2 - Python

Andy Hayden's answer (index.levels[blah]) is great for some scenarios, but can lead to odd behavior in others. My understanding is that Pandas goes to great lengths to "reuse" indices when possible to avoid having the indices of lots of similarly-indexed DataFrames taking up space in memory. As a result, I've found the following annoying behavior:

import pandas as pd
import numpy as np

np.random.seed(0)

idx = pd.MultiIndex.from_product([['John', 'Josh', 'Alex'], list('abcde')], 
                                 names=['Person', 'Letter'])
large = pd.DataFrame(data=np.random.randn(15, 2), 
                     index=idx, 
                     columns=['one', 'two'])
small = large.loc[['Jo'==d[0:2] for d in large.index.get_level_values('Person')]]

print small.index.levels[0]
print large.index.levels[0]

Which outputs

Index([u'Alex', u'John', u'Josh'], dtype='object')
Index([u'Alex', u'John', u'Josh'], dtype='object')

rather than the expected

Index([u'John', u'Josh'], dtype='object')
Index([u'Alex', u'John', u'Josh'], dtype='object')

As one person pointed out on the other thread, one idiom that seems very natural and works properly would be:

small.index.get_level_values('Person').unique()
large.index.get_level_values('Person').unique()

I hope this helps someone else dodge the super-unexpected behavior that I ran into.

Solution 3 - Python

Another way is to use unique() function of index

df.index.unique('B')

Unlike levels this function is documented.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionsethView Question on Stackoverflow
Solution 1 - PythonAndy HaydenView Answer on Stackoverflow
Solution 2 - Python8one6View Answer on Stackoverflow
Solution 3 - PythonmirkhosroView Answer on Stackoverflow