Extracting specific selected columns to new DataFrame as a copy

PythonPandasChained Assignment

Python Problem Overview


I have a pandas DataFrame with 4 columns and I want to create a new DataFrame that only has three of the columns. This question is similar to: https://stackoverflow.com/questions/10085806/extracting-specific-columns-from-a-data-frame but for pandas not R. The following code does not work, raises an error, and is certainly not the pandasnic way to do it.

import pandas as pd
old = pd.DataFrame({'A' : [4,5], 'B' : [10,20], 'C' : [100,50], 'D' : [-30,-50]})
new = pd.DataFrame(zip(old.A, old.C, old.D)) # raises TypeError: data argument can't be an iterator 

What is the pandasnic way to do it?

Python Solutions


Solution 1 - Python

There is a way of doing this and it actually looks similar to R

new = old[['A', 'C', 'D']].copy()

Here you are just selecting the columns you want from the original data frame and creating a variable for those. If you want to modify the new dataframe at all you'll probably want to use .copy() to avoid a SettingWithCopyWarning.

An alternative method is to use filter which will create a copy by default:

new = old.filter(['A','B','D'], axis=1)

Finally, depending on the number of columns in your original dataframe, it might be more succinct to express this using a drop (this will also create a copy by default):

new = old.drop('B', axis=1)

Solution 2 - Python

The easiest way is

new = old[['A','C','D']]

.

Solution 3 - Python

Another simpler way seems to be:

new = pd.DataFrame([old.A, old.B, old.C]).transpose()

where old.column_name will give you a series. Make a list of all the column-series you want to retain and pass it to the DataFrame constructor. We need to do a transpose to adjust the shape.

In [14]:pd.DataFrame([old.A, old.B, old.C]).transpose()
Out[14]: 
   A   B    C
0  4  10  100
1  5  20   50

Solution 4 - Python

columns by index:

# selected column index: 1, 6, 7
new = old.iloc[: , [1, 6, 7]].copy() 

Solution 5 - Python

As far as I can tell, you don't necessarily need to specify the axis when using the filter function.

new = old.filter(['A','B','D'])

returns the same dataframe as

new = old.filter(['A','B','D'], axis=1)

Solution 6 - Python

Generic functional form

def select_columns(data_frame, column_names):
    new_frame = data_frame.loc[:, column_names]
    return new_frame

Specific for your problem above

selected_columns = ['A', 'C', 'D']
new = select_columns(old, selected_columns)

Solution 7 - Python

If you want to have a new data frame then:

import pandas as pd
old = pd.DataFrame({'A' : [4,5], 'B' : [10,20], 'C' : [100,50], 'D' : [-30,-50]})
new=  old[['A', 'C', 'D']]

Solution 8 - Python

You can drop columns in the index:

df = pd.DataFrame({'A': [1, 1], 'B': [2, 2], 'C': [3, 3], 'D': [4, 4]})

df[df.columns.drop(['B', 'C'])]

or

df.loc[:, df.columns.drop(['B', 'C'])]

Output:

   A  D
0  1  4
1  1  4

Solution 9 - Python

As an alternative:

new = pd.DataFrame().assign(A=old['A'], C=old['C'], D=old['D'])

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionSpeedCoder5View Question on Stackoverflow
Solution 1 - PythonjohnchaseView Answer on Stackoverflow
Solution 2 - PythonstidmattView Answer on Stackoverflow
Solution 3 - PythonHitView Answer on Stackoverflow
Solution 4 - Pythonsailfish009View Answer on Stackoverflow
Solution 5 - PythonEllenView Answer on Stackoverflow
Solution 6 - PythonDeslin NaidooView Answer on Stackoverflow
Solution 7 - PythonAli.EView Answer on Stackoverflow
Solution 8 - PythonMykola ZotkoView Answer on Stackoverflow
Solution 9 - PythonDimitris ParaschakisView Answer on Stackoverflow