How to iterate over pandas multiindex dataframe using index

PythonPandas

Python Problem Overview


I have a data frame df which looks like this. Date and Time are 2 multilevel index

                           observation1   observation2
date          Time                             
2012-11-02    9:15:00      79.373668      224
              9:16:00      130.841316     477
2012-11-03    9:15:00      45.312814      835
              9:16:00      123.776946     623
              9:17:00      153.76646      624
              9:18:00      463.276946     626
              9:19:00      663.176934     622
              9:20:00      763.77333      621
2012-11-04    9:15:00      115.449437     122
              9:16:00      123.776946     555
              9:17:00      153.76646      344
              9:18:00      463.276946     212

I want to run some complex process over daily data block.

Pseudo code would look like

 for count in df(level 0 index) :
     new_df = get only chunk for count
     complex_process(new_df)

So, first of all, I could not find a way to access only blocks for a date

2012-11-03    9:15:00      45.312814      835
              9:16:00      123.776946     623
              9:17:00      153.76646      624
              9:18:00      463.276946     626
              9:19:00      663.176934     622
              9:20:00      763.77333      621

and then send it for processing. I am doing this in for loop as I am not sure if there is any way to do it without mentioning exact value of level 0 column. I did some basic search and able to get df.index.get_level_values(0), but it returns me all the values and that causes loop to run multiple times for a day. I want to create a dataframe per day and send it for processing.

Python Solutions


Solution 1 - Python

One easy way would be to groupby the first level of the index - iterating over the groupby object will return the group keys and a subframe containing each group.

In [136]: for date, new_df in df.groupby(level=0):
     ...:     print(new_df)
     ...:     
                    observation1  observation2
date       Time                               
2012-11-02 9:15:00     79.373668           224
           9:16:00    130.841316           477

                    observation1  observation2
date       Time                               
2012-11-03 9:15:00     45.312814           835
           9:16:00    123.776946           623
           9:17:00    153.766460           624
           9:18:00    463.276946           626
           9:19:00    663.176934           622
           9:20:00    763.773330           621

                    observation1  observation2
date       Time                               
2012-11-04 9:15:00    115.449437           122
           9:16:00    123.776946           555
           9:17:00    153.766460           344
           9:18:00    463.276946           212

You can also use droplevel to remove the first index (the useless date index):

In [136]: for date, new_df in df.groupby(level=0):
     ...:     print(new_df.droplevel(0))
     ...:
         observation1  observation2
Time
9:15:00     79.373668           224
9:16:00    130.841316           477
...

Solution 2 - Python

What about this?

for idate in df.index.get_level_values('date'):
    complex_process(df.ix[idate], idate)

Solution 3 - Python

Tagging off of @psorenson answer, we can get unique level indices and its related data frame slices without numpy as follows:

for date in df.index.get_level_values('date').unique():
    print(df.loc[date])

Solution 4 - Python

Late to the party, I found that the following works, too:

for date in df.index.unique("date"):
    print(df.loc[date])

It uses the level optional parameter of the Index.unique method introduced in version 0.23.0.

You can specify either the level number or label.

Solution 5 - Python

Another alternative:

for date in df.index.levels[0]:
    print(df.loc[date])

The difference with the df.index.unique("date") proposed by @sanzoghenzo is that it refers to the index level by its number rather than name.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionYantraguruView Question on Stackoverflow
Solution 1 - PythonchrisbView Answer on Stackoverflow
Solution 2 - PythonpsorensonView Answer on Stackoverflow
Solution 3 - PythonmelbayView Answer on Stackoverflow
Solution 4 - PythonsanzoghenzoView Answer on Stackoverflow
Solution 5 - PythonRoger VadimView Answer on Stackoverflow