df.drop if it exists

Python Problem Overview

Below is a function that takes a file and drops column names 'row_num", 'start_date', 'end_date.'

The problem is not every file has each of these column names, so the function returns an error.

My goal is to alter code so that it removes these columns if it exists but does not return an error if the column does not exist.

def read_df(file):
    df = pd.read_csv(file, na_values=['', ' '])
    # Drop useless junk and fill empty values with zero 
    df = df.drop(['row_num','start_date','end_date','symbol'], axis=1).fillna(0)
    df=df[df!=0][:-1].dropna().append(df.iloc[-1])
    return df

Python Solutions

Solution 1 - Python

Add parameter errors to DataFrame.drop:

>errors : {'ignore', 'raise'}, default 'raise' > >If 'ignore', suppress error and only existing labels are dropped.

df = df.drop(['row_num','start_date','end_date','symbol'], axis=1, errors='ignore')

Sample:

df = pd.DataFrame({'row_num':[1,2], 'w':[3,4]})
df = df.drop(['row_num','start_date','end_date','symbol'], axis=1, errors='ignore')
print (df)
   w
0  3
1  4

Solution 2 - Python

In my tests the following was at least as fast as any of the given answers:

candidates=['row_num','start_date','end_date','symbol']
df = df.drop([x for x in candidates if x in df.columns], axis=1)

It has the benefit of readability and (with a small tweak to the code) the ability to record exactly which columns existed/were dropped when.

Some reasons this might be more desireable than the previous solutions:

Looping over the items and dropping each column individually if it exists is functional, but quite slow (see benchmarks below).
jezrael's answer is very nice, but made me nervous at first (ignoring errors feels bad!). Further looking at the documentation makes it sounds like this is OK though, and simply ignores the error of the column not existing (not other errors that might be undesireable to ignore). My solution may be more readable, especially for those less familiar with optional kwargs in pandas.

Benchmark Results:

Code for benchmark tests (credit to an answer in this question for how to create this sort of benchmark):

import math
from simple_benchmark import benchmark
import pandas as pd

# setting up the toy df:
def df_creator(length):
    c1=list(range(0,10))
    c2=list('a,b,c,d,e'.split(','))
    c3=list(range(0,5))
    c4=[True,False]
    lists=[c1,c2,c3,c4]
    df=pd.DataFrame()
    count=0
    for x in lists:
        count+=1
        df['col'+str(count)]=x*math.floor(length/len(x))
    return df

# setting up benchmark test:
def list_comp(df,candidates=['col1','col2','col5','col8']):
    return df.drop([x for x in candidates if x in df.columns], axis=1)

def looper(df,candidates=['col1','col2','col5','col8']):
    for col in candidates:
        if col in df.columns:
            out = df.drop(columns=col, axis=1)
    return out

def ignore_error(df,candidates=['col1','col2','col5','col8']):
    return df.drop(candidates, axis=1, errors='ignore')

functions=[list_comp,looper,ignore_error]

args={n : df_creator(n) for n in [10,100,1000,10000,100000]}
argname='df_length'
b=benchmark(functions,args,argname)
b.plot()

Solution 3 - Python

Just use Pandas Filter, the Pythonic Way

Oddly, No answers use the pandas dataframe filter method

thisFilter = df.filter(drop_list)
df.drop(thisFilter, inplace=True, axis=1)

This will create a filter from the drop_list that exists in df, then drop thisFilter from the df inplace on axis=1

i.e., drop the columns that match the drop_list and don't error if they are nonexistent

Solution 4 - Python

I just had to do this; here's what I did:

# Drop these columns if they exist
cols = ['Billing Address Street 1', 'Billing Address Street 2','Billing Company']
for col in cols:
    if col in df.columns:
        df = df.drop(columns=col, axis=1)

Might not be the best way, but it served it's purpose.

Solution 5 - Python

x = ['row_num','start_date','end_date','symbol']

To check if column exists then You can do:

for i in x:
    if i in df:
        df = df.drop(['row_num','start_date','end_date','symbol'], axis=1).fillna(0)

for i in x:
    if i in df.columns:
        df = df.drop(['row_num','start_date','end_date','symbol'], axis=1).fillna(0)

Content Type	Original Author	Original Content on Stackoverflow
Question	ZJAY	View Question on Stackoverflow
Solution 1 - Python	jezrael	View Answer on Stackoverflow
Solution 2 - Python	amquack	View Answer on Stackoverflow
Solution 3 - Python	Timothy L.J. Stewart	View Answer on Stackoverflow
Solution 4 - Python	Coffee and Code	View Answer on Stackoverflow
Solution 5 - Python	Khakhar Shyam	View Answer on Stackoverflow