Pandas: Get unique MultiIndex level values by label

PythonPandas

Python Problem Overview


Say you have this MultiIndex-ed DataFrame:

df = pd.DataFrame({'co':['DE','DE','FR','FR'],
                   'tp':['Lake','Forest','Lake','Forest'],
                   'area':[10,20,30,40],
                   'count':[7,5,2,3]})
df = df.set_index(['co','tp'])

Which looks like this:

           area  count
co tp
DE Lake      10      7
   Forest    20      5
FR Lake      30      2
   Forest    40      3

I would like to retrieve the unique values per index level. This can be accomplished using

df.index.levels[0]  # returns ['DE', 'FR]
df.index.levels[1]  # returns ['Lake', 'Forest']

What I would really like to do, is to retrieve these lists by addressing the levels by their name, i.e. 'co' and 'tp'. The shortest two ways I could find looks like this:

list(set(df.index.get_level_values('co')))  # returns ['DE', 'FR']
df.index.levels[df.index.names.index('co')]  # returns ['DE', 'FR']

But non of them are very elegant. Is there a shorter way?

Python Solutions


Solution 1 - Python

Pandas 0.23.0 finally introduced a much cleaner solution to this problem: the level argument to Index.unique():

In [3]: df.index.unique(level='co')
Out[3]: Index(['DE', 'FR'], dtype='object', name='co')

This is now the recommended solution. It is far more efficient because it avoids creating a complete representation of the level values in memory, and re-scanning it.

Solution 2 - Python

I guess u want unique values in a certain level (and by level names) of a multiindex. I usually do the following, which is a bit long.

In [11]: df.index.get_level_values('co').unique()
Out[11]: array(['DE', 'FR'], dtype=object)

Solution 3 - Python

An alternative approach is to find the number of levels by calling df.index.levels[level_index] where level_index can be inferred from df.index.names.index(level_name). In the above example level_name = 'co'.

The proposed answer by @Happy001 computes the unique which may be computationally intensive.

Solution 4 - Python

If you're going to do the level lookup repeatedly, you could create a map of your index level names to level unique values with:

df_level_value_map = {
    name: level 
    for name, level in zip(df.index.names, df.index.levels)
}
df_level_value_map['']

But this is not in any way more efficient (or shorter) than your original attempts if you're only going to do this lookup once.

I really wish there was a method on indexes that returned such a dictionary (or series?) with a name like:

df.index.get_level_map(levels={...})

Where the levels parameter can limit the map to a subset of the existing levels. I could do without the parameter if it could be a property like:

df.index.level_map

Solution 5 - Python

If you already know the index names, is it not straightforward to simply do: df['co'].unique() ?

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionojdoView Question on Stackoverflow
Solution 1 - PythonPietro BattistonView Answer on Stackoverflow
Solution 2 - PythonHappy001View Answer on Stackoverflow
Solution 3 - PythonHanan ShteingartView Answer on Stackoverflow
Solution 4 - PythonLeoRochaelView Answer on Stackoverflow
Solution 5 - PythonCyclicUniverseView Answer on Stackoverflow