Sort pandas dataframe both on values of a column and index?

PythonPandasSortingDataframe

Python Problem Overview


Is it feasible to sort pandas dataframe by values of a column, but also by index?

If you sort a pandas dataframe by values of a column, you can get the resultant dataframe sorted by the column, but unfortunately, you see the order of your dataframe's index messy within the same value of a sorted column.

So, can I sort a dataframe by a column, such as the column named count but also sort it by the value of index? And is it also feasible to sort a column by descending order, but whereas sort a index by ascending order?

I know how to sort multiple columns in dataframe, and also know I can achieve what I'm asking here by first reset_index() the index and sort it, and then create the index again. But is it more intuitive and efficient way to do it?

Python Solutions


Solution 1 - Python

Pandas 0.23 finally gets you there :-D

You can now pass index names (and not only column names) as parameters to sort_values. So, this one-liner works:

df = df.sort_values(by = ['MyCol', 'MyIdx'], ascending = [False, True])

And if your index is currently unnamed:

df = df.rename_axis('MyIdx').sort_values(by = ['MyCol', 'MyIdx'], ascending = [False, True])

Solution 2 - Python

In pandas 0.23+ you can do it directly - see OmerB's answer. If you don't yet have 0.23+, read on.


I'd venture that the simplest way is to just copy your index over to a column, and then sort by both.

df['colFromIndex'] = df.index
df = df.sort(['count', 'colFromIndex'])

I'd also prefer to be able to just do something like df.sort(['count', 'index']), but of course that doesn't work.

Solution 3 - Python

As of pandas version 0.22.

You can temporarily set the column as an index, sort the index on that column and then reset. By default it will maintain the order of the existing index:

df = df.set_index('column_name', append=True).sort_index(level=1).reset_index(level=1)

I think the above could be done with 'inplace' options but I think it's easier to read as above.

Solution 4 - Python

You can use the ascending parameter in sort_index, but you must pass it as a list for it to work correctly as of pandas 0.22.0.

import pandas as pd
import numpy as np
df = pd.DataFrame({'idx_0':[2]*6+[1]*5,
                   'idx_1':[6,4,2,10,18,5,11,1,7,9,3],
                   'value_1':np.arange(11,0,-1),
                   'MyName':list('SORTEDFRAME')})

df = df.set_index(['idx_0','idx_1'])
df

Output:

            MyName  value_1
idx_0 idx_1                
2     6          S       11
      4          O       10
      2          R        9
      10         T        8
      18         E        7
      5          D        6
1     11         F        5
      1          R        4
      7          A        3
      9          M        2
      3          E        1

Sorting by values and index should get "FRAMESORTED" instead of "SORTEDFRAME"

df.sort_values('value_1', ascending=False)\
  .sort_index(level=0, ascending=[True])

Output:

            MyName  value_1
idx_0 idx_1                
1     11         F        5
      1          R        4
      7          A        3
      9          M        2
      3          E        1
2     6          S       11
      4          O       10
      2          R        9
      10         T        8
      18         E        7
      5          D        6

Note you must pass ascending parameter in sort_index as a list and not as a scalar. It will not work.

Solution 5 - Python

To sort a column descending, while maintaining the index ascending:

import pandas as pd
df = pd.DataFrame(index=range(5), data={'c': [4,2,2,4,2]})
df.index = df.index[::-1]
print df.sort(column='c', ascending=False)

Output:

   c
1  4
4  4
0  2
2  2
3  2

Solution 6 - Python

You can use a combination of groupby and apply:

In [2]: df = pd.DataFrame({
            'transID':  range(8),
            'Location': ['New York','Chicago','New York','New York','Atlanta','Los Angeles',
                            'Chicago','Atlanta'],
            'Sales':    np.random.randint(0,10000,8)}).set_index('transID')
In [3]: df
Out[3]:
        Location    Sales
transID
0       New York    1082
1       Chicago     1664
2       New York    692
3       New York    5669
4       Atlanta     7715
5       Los Angeles 987
6       Chicago     4085
7       Atlanta     2927

In [4]: df.groupby('Location').apply(lambda d: d.sort()).reset_index('Location',drop=True)
Out[4]:
        Location    Sales
transID
4       Atlanta     7715
7       Atlanta     2927
1       Chicago     1664
6       Chicago     4085
5       Los Angeles 987
0       New York    1082
2       New York    692
3       New York    5669

I drop 'Location' at in the last line because groupby inserts the grouped levels into the first positions in the index. Sorting and then dropping them preserves the sorted order.

Solution 7 - Python

I believe that the original order from sort_values is kept even when sort_index is applied, so this should work:

df.sort_values('count', ascending=False).sort_index(level=[index_level1, index_level2])

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionBlaszardView Question on Stackoverflow
Solution 1 - PythonOmerBView Answer on Stackoverflow
Solution 2 - PythonfantabolousView Answer on Stackoverflow
Solution 3 - PythonIain DView Answer on Stackoverflow
Solution 4 - PythonScott BostonView Answer on Stackoverflow
Solution 5 - PythoncyborgView Answer on Stackoverflow
Solution 6 - PythonMichael DelgadoView Answer on Stackoverflow
Solution 7 - PythonkekertView Answer on Stackoverflow