how to sort pandas dataframe from one column
PythonPandasDataframeSortingTimePython Problem Overview
I have a data frame like this:
print(df)
0 1 2
0 354.7 April 4.0
1 55.4 August 8.0
2 176.5 December 12.0
3 95.5 February 2.0
4 85.6 January 1.0
5 152 July 7.0
6 238.7 June 6.0
7 104.8 March 3.0
8 283.5 May 5.0
9 278.8 November 11.0
10 249.6 October 10.0
11 212.7 September 9.0
As you can see, months are not in calendar order. So I created a second column to get the month number corresponding to each month (1-12). From there, how can I sort this data frame according to calendar months' order?
Python Solutions
Solution 1 - Python
Use sort_values
to sort the df by a specific column's values:
In [18]:
df.sort_values('2')
Out[18]:
0 1 2
4 85.6 January 1.0
3 95.5 February 2.0
7 104.8 March 3.0
0 354.7 April 4.0
8 283.5 May 5.0
6 238.7 June 6.0
5 152.0 July 7.0
1 55.4 August 8.0
11 212.7 September 9.0
10 249.6 October 10.0
9 278.8 November 11.0
2 176.5 December 12.0
If you want to sort by two columns, pass a list of column labels to sort_values
with the column labels ordered according to sort priority. If you use df.sort_values(['2', '0'])
, the result would be sorted by column 2
then column 0
. Granted, this does not really make sense for this example because each value in df['2']
is unique.
Solution 2 - Python
I tried the solutions above and I do not achieve results, so I found a different solution that works for me. The ascending=False
is to order the dataframe in descending order, by default it is True
. I am using python 3.6.6 and pandas 0.23.4 versions.
final_df = df.sort_values(by=['2'], ascending=False)
You can see more details in pandas documentation here.
Solution 3 - Python
Using column name worked for me.
sorted_df = df.sort_values(by=['Column_name'], ascending=True)
Solution 4 - Python
Just as another solution:
Instead of creating the second column, you can categorize your string data(month name) and sort by that like this:
df.rename(columns={1:'month'},inplace=True)
df['month'] = pd.Categorical(df['month'],categories=['December','November','October','September','August','July','June','May','April','March','February','January'],ordered=True)
df = df.sort_values('month',ascending=False)
It will give you the ordered data by month name
as you specified while creating the Categorical
object.
Solution 5 - Python
Panda's sort_values
does the work.
If one intends to keep the same variable name, don't forget the inplace=True
(this performs the operation in-place)
df.sort_values(by=['2'], inplace=True)
One might as well assign the change (sort) to a variable, that may have the same name, such as the df
as
df = df.sort_values(by=['2'])
Forgetting the steps mentioned above may lead one (as this user) to not be able to get the expected result.
Note that if one wants in descending order, one needs to pass ascending=False
, such as
df = df.sort_values(by=['2'], ascending=False)
Solution 6 - Python
Just adding some more operations on data. Suppose we have a dataframe df
, we can do several operations to get desired outputs
ID cost tax label
1 216590 1600 test
2 523213 1800 test
3 250 1500 experiment
(df['label'].value_counts().to_frame().reset_index()).sort_values('label', ascending=False)
will give sorted
output of labels as a dataframe
index label
0 test 2
1 experiment 1
Solution 7 - Python
This worked for me
df.sort_values(by='Column_name', inplace=True, ascending=False)
Solution 8 - Python
You probably need to reset the index after sorting:
df = df.sort_values('2')
df = df.reset_index(drop=True)
Solution 9 - Python
Here is template of sort_values according to pandas documentation.
DataFrame.sort_values(by, axis=0,
ascending=True,
inplace=False,
kind='quicksort',
na_position='last',
ignore_index=False, key=None)[source]
In this case it will be like this.
df.sort_values(by=['2'])
API Reference pandas.DataFrame.sort_values
Solution 10 - Python
This one worked for me:
df=df.sort_values(by=[2])
Whereas:
df=df.sort_values(by=['2'])
is not working.
Solution 11 - Python
Example: Assume you have a column with values 1 and 0 and you want to separate and use only one value, then:
// furniture is one of the columns in the csv file.
allrooms = data.groupby('furniture')['furniture'].agg('count') allrooms
myrooms1 = pan.DataFrame(allrooms, columns = ['furniture'], index = [1])
myrooms2 = pan.DataFrame(allrooms, columns = ['furniture'], index = [0])
print(myrooms1);print(myrooms2)