Sorting columns in pandas dataframe based on column name

PythonPandasDataframe

Python Problem Overview


I have a dataframe with over 200 columns. The issue is as they were generated the order is

['Q1.3','Q6.1','Q1.2','Q1.1',......]

I need to sort the columns as follows:

['Q1.1','Q1.2','Q1.3',.....'Q6.1',......]

Is there some way for me to do this within Python?

Python Solutions


Solution 1 - Python

df = df.reindex(sorted(df.columns), axis=1)

This assumes that sorting the column names will give the order you want. If your column names won't sort lexicographically (e.g., if you want column Q10.3 to appear after Q9.1), you'll need to sort differently, but that has nothing to do with pandas.

Solution 2 - Python

You can also do more succinctly:

df.sort_index(axis=1)

Make sure you assign the result back:

df = df.sort_index(axis=1)

Or, do it in-place:

df.sort_index(axis=1, inplace=True)

Solution 3 - Python

You can just do:

df[sorted(df.columns)]


Edit: Shorter is

df[sorted(df)]

Solution 4 - Python

Tweet's answer can be passed to BrenBarn's answer above with

data.reindex_axis(sorted(data.columns, key=lambda x: float(x[1:])), axis=1)

So for your example, say:

vals = randint(low=16, high=80, size=25).reshape(5,5)
cols = ['Q1.3', 'Q6.1', 'Q1.2', 'Q9.1', 'Q10.2']
data = DataFrame(vals, columns = cols)

You get:

data

    Q1.3    Q6.1    Q1.2    Q9.1    Q10.2
0   73      29      63      51      72
1   61		29		32		68		57
2   36		49		76		18		37
3   63		61		51		30		31
4   36		66		71		24		77

Then do:

data.reindex_axis(sorted(data.columns, key=lambda x: float(x[1:])), axis=1)

resulting in:

data

	
     Q1.2    Q1.3    Q6.1    Q9.1    Q10.2
0    2       0       1       3       4
1    7       5       6       8       9
2    2       0       1       3       4
3    2       0       1       3       4
4    2       0       1       3       4

Solution 5 - Python

For several columns, You can put columns order what you want:

#['A', 'B', 'C'] <-this is your columns order
df = df[['C', 'B', 'A']]

This example shows sorting and slicing columns:

d = {'col1':[1, 2, 3], 'col2':[4, 5, 6], 'col3':[7, 8, 9], 'col4':[17, 18, 19]}
df = pandas.DataFrame(d)

You get:

col1  col2  col3  col4
 1     4     7    17
 2     5     8    18
 3     6     9    19

Then do:

df = df[['col3', 'col2', 'col1']]

Resulting in:

col3  col2  col1
7     4     1
8     5     2
9     6     3     

Solution 6 - Python

If you need an arbitrary sequence instead of sorted sequence, you could do:

sequence = ['Q1.1','Q1.2','Q1.3',.....'Q6.1',......]
your_dataframe = your_dataframe.reindex(columns=sequence)

I tested this in 2.7.10 and it worked for me.

Solution 7 - Python

Don't forget to add "inplace=True" to Wes' answer or set the result to a new DataFrame.

df.sort_index(axis=1, inplace=True)

Solution 8 - Python

The quickest method is:

df.sort_index(axis=1)

Be aware that this creates a new instance. Therefore you need to store the result in a new variable:

sortedDf=df.sort_index(axis=1)

Solution 9 - Python

The sort method and sorted function allow you to provide a custom function to extract the key used for comparison:

>>> ls = ['Q1.3', 'Q6.1', 'Q1.2']
>>> sorted(ls, key=lambda x: float(x[1:]))
['Q1.2', 'Q1.3', 'Q6.1']

Solution 10 - Python

One use-case is that you have named (some of) your columns with some prefix, and you want the columns sorted with those prefixes all together and in some particular order (not alphabetical).

For example, you might start all of your features with Ft_, labels with Lbl_, etc, and you want all unprefixed columns first, then all features, then the label. You can do this with the following function (I will note a possible efficiency problem using sum to reduce lists, but this isn't an issue unless you have a LOT of columns, which I do not):

def sortedcols(df, groups = ['Ft_', 'Lbl_'] ):
    return df[ sum([list(filter(re.compile(r).search, list(df.columns).copy())) for r in (lambda l: ['^(?!(%s))' % '|'.join(l)] + ['^%s' % i  for i in l ] )(groups)   ], [])  ]

Solution 11 - Python

print df.sort_index(by='Frequency',ascending=False)

where by is the name of the column,if you want to sort the dataset based on column

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionpythOnometristView Question on Stackoverflow
Solution 1 - PythonBrenBarnView Answer on Stackoverflow
Solution 2 - PythonWes McKinneyView Answer on Stackoverflow
Solution 3 - PythonIvelinView Answer on Stackoverflow
Solution 4 - PythonJeremy LowView Answer on Stackoverflow
Solution 5 - PythonMyeongsik JooView Answer on Stackoverflow
Solution 6 - PythonM.ZView Answer on Stackoverflow
Solution 7 - PythonburkesquiresView Answer on Stackoverflow
Solution 8 - PythonmultigoodverseView Answer on Stackoverflow
Solution 9 - PythontweetView Answer on Stackoverflow
Solution 10 - PythonRoko MijicView Answer on Stackoverflow
Solution 11 - PythonAravind KrishnakumarView Answer on Stackoverflow