How to select all columns, except one column in pandas?
PythonPandasPython Problem Overview
I have a dataframe look like this:
import pandas
import numpy as np
df = DataFrame(np.random.rand(4,4), columns = list('abcd'))
df
a b c d
0 0.418762 0.042369 0.869203 0.972314
1 0.991058 0.510228 0.594784 0.534366
2 0.407472 0.259811 0.396664 0.894202
3 0.726168 0.139531 0.324932 0.906575
How I can get all columns except column b
?
Python Solutions
Solution 1 - Python
When the columns are not a MultiIndex, df.columns
is just an array of column names so you can do:
df.loc[:, df.columns != 'b']
a c d
0 0.561196 0.013768 0.772827
1 0.882641 0.615396 0.075381
2 0.368824 0.651378 0.397203
3 0.788730 0.568099 0.869127
Solution 2 - Python
Don't use ix
. It's deprecated. The most readable and idiomatic way of doing this is df.drop()
:
>>> df
a b c d
0 0.175127 0.191051 0.382122 0.869242
1 0.414376 0.300502 0.554819 0.497524
2 0.142878 0.406830 0.314240 0.093132
3 0.337368 0.851783 0.933441 0.949598
>>> df.drop('b', axis=1)
a c d
0 0.175127 0.382122 0.869242
1 0.414376 0.554819 0.497524
2 0.142878 0.314240 0.093132
3 0.337368 0.933441 0.949598
Note that by default, .drop()
does not operate inplace; despite the ominous name, df
is unharmed by this process. If you want to permanently remove b
from df
, do df.drop('b', inplace=True)
.
df.drop()
also accepts a list of labels, e.g. df.drop(['a', 'b'], axis=1)
will drop column a
and b
.
Solution 3 - Python
df[df.columns.difference(['b'])]
Out:
a c d
0 0.427809 0.459807 0.333869
1 0.678031 0.668346 0.645951
2 0.996573 0.673730 0.314911
3 0.786942 0.719665 0.330833
Solution 4 - Python
You can use df.columns.isin()
df.loc[:, ~df.columns.isin(['b'])]
When you want to drop multiple columns, as simple as:
df.loc[:, ~df.columns.isin(['col1', 'col2'])]
Solution 5 - Python
You can drop
columns in index:
df[df.columns.drop('b')]
or
df.loc[:, df.columns.drop('b')]
Output:
a c d
0 0.418762 0.869203 0.972314
1 0.991058 0.594784 0.534366
2 0.407472 0.396664 0.894202
3 0.726168 0.324932 0.906575
Solution 6 - Python
Here is another way:
df[[i for i in list(df.columns) if i != '<your column>']]
You just pass all columns to be shown except of the one you do not want.
Solution 7 - Python
Here is a one line lambda:
df[map(lambda x :x not in ['b'], list(df.columns))]
before:
import pandas
import numpy as np
df = pd.DataFrame(np.random.rand(4,4), columns = list('abcd'))
df
a b c d
0 0.774951 0.079351 0.118437 0.735799
1 0.615547 0.203062 0.437672 0.912781
2 0.804140 0.708514 0.156943 0.104416
3 0.226051 0.641862 0.739839 0.434230
after:
df[map(lambda x :x not in ['b'], list(df.columns))]
a c d
0 0.774951 0.118437 0.735799
1 0.615547 0.437672 0.912781
2 0.804140 0.156943 0.104416
3 0.226051 0.739839 0.434230
Solution 8 - Python
I think the best way to do is the way mentioned by @Salvador Dali. Not that the others are wrong.
Because when you have a data set where you just want to select one column and put it into one variable and the rest of the columns into another for comparison or computational purposes. Then dropping the column of the data set might not help. Of course there are use cases for that as well.
x_cols = [x for x in data.columns if x != 'name of column to be excluded']
Then you can put those collection of columns in variable x_cols
into another variable like x_cols1
for other computation.
ex: x_cols1 = data[x_cols]
Solution 9 - Python
Another slight modification to @Salvador Dali enables a list of columns to exclude:
df[[i for i in list(df.columns) if i not in [list_of_columns_to_exclude]]]
or
df.loc[:,[i for i in list(df.columns) if i not in [list_of_columns_to_exclude]]]
Solution 10 - Python
I think a nice solution is with the function filter of pandas and regex (match everything except "b"):
df.filter(regex="^(?!b$)")
Solution 11 - Python
Similar to @Toms answer, it is also possible to select all columns except "b" without using .loc, like so:
df[df.columns[~df.columns.isin(['b'])]]