How to drop rows from pandas data frame that contains a particular string in a particular column?

PythonPandas

Python Problem Overview


I have a very large data frame in python and I want to drop all rows that have a particular string inside a particular column.

For example, I want to drop all rows which have the string "XYZ" as a substring in the column C of the data frame.

Can this be implemented in an efficient way using .drop() method?

Python Solutions


Solution 1 - Python

pandas has vectorized string operations, so you can just filter out the rows that contain the string you don't want:

In [91]: df = pd.DataFrame(dict(A=[5,3,5,6], C=["foo","bar","fooXYZbar", "bat"]))

In [92]: df
Out[92]:
   A          C
0  5        foo
1  3        bar
2  5  fooXYZbar
3  6        bat

In [93]: df[~df.C.str.contains("XYZ")]
Out[93]:
   A    C
0  5  foo
1  3  bar
3  6  bat

Solution 2 - Python

If your string constraint is not just one string you can drop those corresponding rows with:

df = df[~df['your column'].isin(['list of strings'])]

The above will drop all rows containing elements of your list

Solution 3 - Python

This will only work if you want to compare exact strings. It will not work in case you want to check if the column string contains any of the strings in the list.

The right way to compare with a list would be :

searchfor = ['john', 'doe']
df = df[~df.col.str.contains('|'.join(searchfor))]

Solution 4 - Python

Slight modification to the code. Having na=False will skip empty values. Otherwise you can get an error TypeError: bad operand type for unary ~: float

df[~df.C.str.contains("XYZ", na=False)]

Source: https://stackoverflow.com/questions/52297740/typeerror-bad-operand-type-for-unary-float

Solution 5 - Python

Solution 6 - Python

The below code will give you list of all the rows:-

df[df['C'] != 'XYZ']

To store the values from the above code into a dataframe :-

newdf = df[df['C'] != 'XYZ']

Solution 7 - Python

if you do not want to delete all NaN, use

df[~df.C.str.contains("XYZ") == True]

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionLondon guyView Question on Stackoverflow
Solution 1 - PythonBrian from QuantRocketView Answer on Stackoverflow
Solution 2 - PythonKenanView Answer on Stackoverflow
Solution 3 - PythonRupert SchiesslView Answer on Stackoverflow
Solution 4 - PythonDevarshi MandalView Answer on Stackoverflow
Solution 5 - PythonAmy AnnineView Answer on Stackoverflow
Solution 6 - Pythonak3191View Answer on Stackoverflow
Solution 7 - PythonZhou RuohuaView Answer on Stackoverflow