how to sort pandas dataframe from one column

PythonPandasDataframeSortingTime

Python Problem Overview


I have a data frame like this:

print(df)

        0          1     2
0   354.7      April   4.0
1    55.4     August   8.0
2   176.5   December  12.0
3    95.5   February   2.0
4    85.6    January   1.0
5     152       July   7.0
6   238.7       June   6.0
7   104.8      March   3.0
8   283.5        May   5.0
9   278.8   November  11.0
10  249.6    October  10.0
11  212.7  September   9.0

As you can see, months are not in calendar order. So I created a second column to get the month number corresponding to each month (1-12). From there, how can I sort this data frame according to calendar months' order?

Python Solutions


Solution 1 - Python

Use sort_values to sort the df by a specific column's values:

In [18]:
df.sort_values('2')

Out[18]:
        0          1     2
4    85.6    January   1.0
3    95.5   February   2.0
7   104.8      March   3.0
0   354.7      April   4.0
8   283.5        May   5.0
6   238.7       June   6.0
5   152.0       July   7.0
1    55.4     August   8.0
11  212.7  September   9.0
10  249.6    October  10.0
9   278.8   November  11.0
2   176.5   December  12.0

If you want to sort by two columns, pass a list of column labels to sort_values with the column labels ordered according to sort priority. If you use df.sort_values(['2', '0']), the result would be sorted by column 2 then column 0. Granted, this does not really make sense for this example because each value in df['2'] is unique.

Solution 2 - Python

I tried the solutions above and I do not achieve results, so I found a different solution that works for me. The ascending=False is to order the dataframe in descending order, by default it is True. I am using python 3.6.6 and pandas 0.23.4 versions.

final_df = df.sort_values(by=['2'], ascending=False)

You can see more details in pandas documentation here.

Solution 3 - Python

Using column name worked for me.

sorted_df = df.sort_values(by=['Column_name'], ascending=True)

Solution 4 - Python

Just as another solution:

Instead of creating the second column, you can categorize your string data(month name) and sort by that like this:

df.rename(columns={1:'month'},inplace=True)
df['month'] = pd.Categorical(df['month'],categories=['December','November','October','September','August','July','June','May','April','March','February','January'],ordered=True)
df = df.sort_values('month',ascending=False)

It will give you the ordered data by month name as you specified while creating the Categorical object.

Solution 5 - Python

Panda's sort_values does the work.

If one intends to keep the same variable name, don't forget the inplace=True (this performs the operation in-place)

df.sort_values(by=['2'], inplace=True)

One might as well assign the change (sort) to a variable, that may have the same name, such as the df as

df = df.sort_values(by=['2'])

Forgetting the steps mentioned above may lead one (as this user) to not be able to get the expected result.

Note that if one wants in descending order, one needs to pass ascending=False, such as

df = df.sort_values(by=['2'], ascending=False)

Solution 6 - Python

Just adding some more operations on data. Suppose we have a dataframe df, we can do several operations to get desired outputs

ID	       cost	     tax    label
1	    216590  	1600	test      
2	    523213   	1800	test 
3          250      1500    experiment

(df['label'].value_counts().to_frame().reset_index()).sort_values('label', ascending=False)

will give sorted output of labels as a dataframe

    index	label
0	test	    2
1	experiment	1

Solution 7 - Python

This worked for me

df.sort_values(by='Column_name', inplace=True, ascending=False)

Solution 8 - Python

You probably need to reset the index after sorting:

df = df.sort_values('2')
df = df.reset_index(drop=True)

Solution 9 - Python

Here is template of sort_values according to pandas documentation.

DataFrame.sort_values(by, axis=0,
                          ascending=True,
                          inplace=False,
                          kind='quicksort',
                          na_position='last',
                          ignore_index=False, key=None)[source]

In this case it will be like this.

df.sort_values(by=['2'])

API Reference pandas.DataFrame.sort_values

Solution 10 - Python

This one worked for me:

df=df.sort_values(by=[2])

Whereas:

df=df.sort_values(by=['2']) 

is not working.

Solution 11 - Python

Example: Assume you have a column with values 1 and 0 and you want to separate and use only one value, then:

// furniture is one of the columns in the csv file.

allrooms = data.groupby('furniture')['furniture'].agg('count') allrooms

myrooms1 = pan.DataFrame(allrooms, columns = ['furniture'], index = [1])

myrooms2 = pan.DataFrame(allrooms, columns = ['furniture'], index = [0])

print(myrooms1);print(myrooms2)

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionSachila RanawakaView Question on Stackoverflow
Solution 1 - PythonEdChumView Answer on Stackoverflow
Solution 2 - PythonJoel CarneiroView Answer on Stackoverflow
Solution 3 - PythonNirajView Answer on Stackoverflow
Solution 4 - Pythonalireza yazdandoostView Answer on Stackoverflow
Solution 5 - PythonGonçalo PeresView Answer on Stackoverflow
Solution 6 - PythonHari_pbView Answer on Stackoverflow
Solution 7 - Pythonsuzanne chenView Answer on Stackoverflow
Solution 8 - Pythonmojtaba rezaeiView Answer on Stackoverflow
Solution 9 - PythonNafees AhmadView Answer on Stackoverflow
Solution 10 - PythonHemapriya R.View Answer on Stackoverflow
Solution 11 - PythonAeStudiosView Answer on Stackoverflow