Move column by name to front of table in pandas
PythonPandasMoveDataframeShiftPython Problem Overview
Here is my df:
Net Upper Lower Mid Zsore
Answer option
More than once a day 0% 0.22% -0.12% 2 65
Once a day 0% 0.32% -0.19% 3 45
Several times a week 2% 2.45% 1.10% 4 78
Once a week 1% 1.63% -0.40% 6 65
How can I move a column by name ("Mid"
) to the front of the table, index 0. This is what the result should look like:
Mid Upper Lower Net Zsore
Answer option
More than once a day 2 0.22% -0.12% 0% 65
Once a day 3 0.32% -0.19% 0% 45
Several times a week 4 2.45% 1.10% 2% 78
Once a week 6 1.63% -0.40% 1% 65
My current code moves the column by index using df.columns.tolist()
but I'd like to shift it by name.
Python Solutions
Solution 1 - Python
We can use ix
to reorder by passing a list:
In [27]:
# get a list of columns
cols = list(df)
# move the column to head of list using index, pop and insert
cols.insert(0, cols.pop(cols.index('Mid')))
cols
Out[27]:
['Mid', 'Net', 'Upper', 'Lower', 'Zsore']
In [28]:
# use ix to reorder
df = df.ix[:, cols]
df
Out[28]:
Mid Net Upper Lower Zsore
Answer_option
More_than_once_a_day 2 0% 0.22% -0.12% 65
Once_a_day 3 0% 0.32% -0.19% 45
Several_times_a_week 4 2% 2.45% 1.10% 78
Once_a_week 6 1% 1.63% -0.40% 65
Another method is to take a reference to the column and reinsert it at the front:
In [39]:
mid = df['Mid']
df.drop(labels=['Mid'], axis=1,inplace = True)
df.insert(0, 'Mid', mid)
df
Out[39]:
Mid Net Upper Lower Zsore
Answer_option
More_than_once_a_day 2 0% 0.22% -0.12% 65
Once_a_day 3 0% 0.32% -0.19% 45
Several_times_a_week 4 2% 2.45% 1.10% 78
Once_a_week 6 1% 1.63% -0.40% 65
You can also use loc
to achieve the same result as ix
will be deprecated in a future version of pandas from 0.20.0
onwards:
df = df.loc[:, cols]
Solution 2 - Python
Maybe I'm missing something, but a lot of these answers seem overly complicated. You should be able to just set the columns within a single list:
Column to the front:
df = df[ ['Mid'] + [ col for col in df.columns if col != 'Mid' ] ]
Or if instead, you want to move it to the back:
df = df[ [ col for col in df.columns if col != 'Mid' ] + ['Mid'] ]
Or if you wanted to move more than one column:
cols_to_move = ['Mid', 'Zsore']
df = df[ cols_to_move + [ col for col in df.columns if col not in cols_to_move ] ]
Solution 3 - Python
I prefer this solution:
col = df.pop("Mid")
df.insert(0, col.name, col)
It's simpler to read and faster than other suggested answers.
def move_column_inplace(df, col, pos):
col = df.pop(col)
df.insert(pos, col.name, col)
Performance assessment:
For this test, the currently last column is moved to the front in each repetition. In-place methods generally perform better. While citynorman's solution can be made in-place, Ed Chum's method based on .loc
and sachinnm's method based on reindex
cannot.
While other methods are generic, citynorman's solution is limited to pos=0
. I didn't observe any performance difference between df.loc[cols]
and df[cols]
, which is why I didn't include some other suggestions.
I tested with python 3.6.8 and pandas 0.24.2 on a MacBook Pro (Mid 2015).
import numpy as np
import pandas as pd
n_cols = 11
df = pd.DataFrame(np.random.randn(200000, n_cols),
columns=range(n_cols))
def move_column_inplace(df, col, pos):
col = df.pop(col)
df.insert(pos, col.name, col)
def move_to_front_normanius_inplace(df, col):
move_column_inplace(df, col, 0)
return df
def move_to_front_chum(df, col):
cols = list(df)
cols.insert(0, cols.pop(cols.index(col)))
return df.loc[:, cols]
def move_to_front_chum_inplace(df, col):
col = df[col]
df.drop(col.name, axis=1, inplace=True)
df.insert(0, col.name, col)
return df
def move_to_front_elpastor(df, col):
cols = [col] + [ c for c in df.columns if c!=col ]
return df[cols] # or df.loc[cols]
def move_to_front_sachinmm(df, col):
cols = df.columns.tolist()
cols.insert(0, cols.pop(cols.index(col)))
df = df.reindex(columns=cols, copy=False)
return df
def move_to_front_citynorman_inplace(df, col):
# This approach exploits that reset_index() moves the index
# at the first position of the data frame.
df.set_index(col, inplace=True)
df.reset_index(inplace=True)
return df
def test(method, df):
col = np.random.randint(0, n_cols)
method(df, col)
col = np.random.randint(0, n_cols)
ret_mine = move_to_front_normanius_inplace(df.copy(), col)
ret_chum1 = move_to_front_chum(df.copy(), col)
ret_chum2 = move_to_front_chum_inplace(df.copy(), col)
ret_elpas = move_to_front_elpastor(df.copy(), col)
ret_sach = move_to_front_sachinmm(df.copy(), col)
ret_city = move_to_front_citynorman_inplace(df.copy(), col)
# Assert equivalence of solutions.
assert(ret_mine.equals(ret_chum1))
assert(ret_mine.equals(ret_chum2))
assert(ret_mine.equals(ret_elpas))
assert(ret_mine.equals(ret_sach))
assert(ret_mine.equals(ret_city))
Results:
# For n_cols = 11:
%timeit test(move_to_front_normanius_inplace, df)
# 1.05 ms ± 42.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit test(move_to_front_citynorman_inplace, df)
# 1.68 ms ± 46.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit test(move_to_front_sachinmm, df)
# 3.24 ms ± 96.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit test(move_to_front_chum, df)
# 3.84 ms ± 114 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit test(move_to_front_elpastor, df)
# 3.85 ms ± 58.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit test(move_to_front_chum_inplace, df)
# 9.67 ms ± 101 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# For n_cols = 31:
%timeit test(move_to_front_normanius_inplace, df)
# 1.26 ms ± 31.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit test(move_to_front_citynorman_inplace, df)
# 1.95 ms ± 260 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit test(move_to_front_sachinmm, df)
# 10.7 ms ± 348 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit test(move_to_front_chum, df)
# 11.5 ms ± 869 µs per loop (mean ± std. dev. of 7 runs, 100 loops each
%timeit test(move_to_front_elpastor, df)
# 11.4 ms ± 598 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit test(move_to_front_chum_inplace, df)
# 31.4 ms ± 1.89 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Solution 4 - Python
You can use the df.reindex() function in pandas. df is
Net Upper Lower Mid Zsore
Answer option
More than once a day 0% 0.22% -0.12% 2 65
Once a day 0% 0.32% -0.19% 3 45
Several times a week 2% 2.45% 1.10% 4 78
Once a week 1% 1.63% -0.40% 6 65
define an list of column names
cols = df.columns.tolist()
cols
Out[13]: ['Net', 'Upper', 'Lower', 'Mid', 'Zsore']
move the column name to wherever you want
cols.insert(0, cols.pop(cols.index('Mid')))
cols
Out[16]: ['Mid', 'Net', 'Upper', 'Lower', 'Zsore']
then use df.reindex()
function to reorder
df = df.reindex(columns= cols)
out put is: df
Mid Upper Lower Net Zsore
Answer option
More than once a day 2 0.22% -0.12% 0% 65
Once a day 3 0.32% -0.19% 0% 45
Several times a week 4 2.45% 1.10% 2% 78
Once a week 6 1.63% -0.40% 1% 65
Solution 5 - Python
I didn't like how I had to explicitly specify all the other column in the other solutions.
cfg_col_sel = ['Mid', 'Zscore']
cfg_col_sel = cfg_col_sel+[s for s in df.columns if not s in cfg_col_sel]
df = df[cfg_col_sel]
This is another way of doing it but less flexible
df = df.set_index('Mid').reset_index()
Solution 6 - Python
Here is a generic set of code that I frequently use to rearrange the position of columns. You may find it useful.
cols = df.columns.tolist()
n = int(cols.index('Mid'))
cols = [cols[n]] + cols[:n] + cols[n+1:]
df = df[cols]
Solution 7 - Python
To reorder the rows of a DataFrame just use a list as follows.
df = df[['Mid', 'Net', 'Upper', 'Lower', 'Zsore']]
This makes it very obvious what was done when reading the code later. Also use:
df.columns
Out[1]: Index(['Net', 'Upper', 'Lower', 'Mid', 'Zsore'], dtype='object')
Then cut and paste to reorder.
For a DataFrame with many columns, store the list of columns in a variable and pop the desired column to the front of the list. Here is an example:
cols = [str(col_name) for col_name in range(1001)]
data = np.random.rand(10,1001)
df = pd.DataFrame(data=data, columns=cols)
mv_col = cols.pop(cols.index('77'))
df = df[[mv_col] + cols]
Now df.columns
has.
Index(['77', '0', '1', '2', '3', '4', '5', '6', '7', '8', ... '991', '992', '993', '994', '995', '996', '997', '998', '999', '1000'],
dtype='object', length=1001)
Solution 8 - Python
df.set_index('Mid').reset_index()
seems to be a pretty easy way about this.
Solution 9 - Python
If you want to move a column to the front of the dataframe, you can use
use set_index()
.
df.set_index(df.pop('column_name'), inplace=True)
df.reset_index(inplace=True)
You first need to set the column you want to bring to the front as the index of the dataframe (and we do a pop
in order to drop the column from the original dataframe before setting it as the index in order to avoid name collision in the next action) and finally call reset_index()
to make the old index the first column of the dataframe.
For more details see How to change the order of dataframe columns in pandas.
Solution 10 - Python
Here is a very simple answer to this.
Don't forget the two (()) 'brackets' around columns names.Otherwise, it'll give you an error.
# here you can add below line and it should work
df = df[list(('Mid','Upper', 'Lower', 'Net','Zsore'))]
df
Mid Upper Lower Net Zsore
Answer option
More than once a day 2 0.22% -0.12% 0% 65
Once a day 3 0.32% -0.19% 0% 45
Several times a week 4 2.45% 1.10% 2% 78
Once a week 6 1.63% -0.40% 1% 65
Solution 11 - Python
If you want to move to another place and reassign the name as well, you can do as the following:
df.insert(0, 'new col name', col)
df.pop("Mid")
<https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.insert.html>
Solution 12 - Python
The most simplist thing you can try is:
df=df[[ 'Mid', 'Upper', 'Lower', 'Net' , 'Zsore']]