Deleting DataFrame row in Pandas based on column value

PythonPandas

Python Problem Overview


I have the following DataFrame:

             daysago  line_race rating        rw    wrating
 line_date                                                 
 2007-03-31       62         11     56  1.000000  56.000000
 2007-03-10       83         11     67  1.000000  67.000000
 2007-02-10      111          9     66  1.000000  66.000000
 2007-01-13      139         10     83  0.880678  73.096278
 2006-12-23      160         10     88  0.793033  69.786942
 2006-11-09      204          9     52  0.636655  33.106077
 2006-10-22      222          8     66  0.581946  38.408408
 2006-09-29      245          9     70  0.518825  36.317752
 2006-09-16      258         11     68  0.486226  33.063381
 2006-08-30      275          8     72  0.446667  32.160051
 2006-02-11      475          5     65  0.164591  10.698423
 2006-01-13      504          0     70  0.142409   9.968634
 2006-01-02      515          0     64  0.134800   8.627219
 2005-12-06      542          0     70  0.117803   8.246238
 2005-11-29      549          0     70  0.113758   7.963072
 2005-11-22      556          0     -1  0.109852  -0.109852
 2005-11-01      577          0     -1  0.098919  -0.098919
 2005-10-20      589          0     -1  0.093168  -0.093168
 2005-09-27      612          0     -1  0.083063  -0.083063
 2005-09-07      632          0     -1  0.075171  -0.075171
 2005-06-12      719          0     69  0.048690   3.359623
 2005-05-29      733          0     -1  0.045404  -0.045404
 2005-05-02      760          0     -1  0.039679  -0.039679
 2005-04-02      790          0     -1  0.034160  -0.034160
 2005-03-13      810          0     -1  0.030915  -0.030915
 2004-11-09      934          0     -1  0.016647  -0.016647

I need to remove the rows where line_race is equal to 0. What's the most efficient way to do this?

Python Solutions


Solution 1 - Python

If I'm understanding correctly, it should be as simple as:

df = df[df.line_race != 0]

Solution 2 - Python

But for any future bypassers you could mention that df = df[df.line_race != 0] doesn't do anything when trying to filter for None/missing values.

Does work:

df = df[df.line_race != 0]

Doesn't do anything:

df = df[df.line_race != None]

Does work:

df = df[df.line_race.notnull()]

Solution 3 - Python

just to add another solution, particularly useful if you are using the new pandas assessors, other solutions will replace the original pandas and lose the assessors

df.drop(df.loc[df['line_race']==0].index, inplace=True)

Solution 4 - Python

If you want to delete rows based on multiple values of the column, you could use:

df[(df.line_race != 0) & (df.line_race != 10)]

To drop all rows with values 0 and 10 for line_race.

Solution 5 - Python

The best way to do this is with boolean masking:

In [56]: df
Out[56]:
     line_date  daysago  line_race  rating    raw  wrating
0   2007-03-31       62         11      56  1.000   56.000
1   2007-03-10       83         11      67  1.000   67.000
2   2007-02-10      111          9      66  1.000   66.000
3   2007-01-13      139         10      83  0.881   73.096
4   2006-12-23      160         10      88  0.793   69.787
5   2006-11-09      204          9      52  0.637   33.106
6   2006-10-22      222          8      66  0.582   38.408
7   2006-09-29      245          9      70  0.519   36.318
8   2006-09-16      258         11      68  0.486   33.063
9   2006-08-30      275          8      72  0.447   32.160
10  2006-02-11      475          5      65  0.165   10.698
11  2006-01-13      504          0      70  0.142    9.969
12  2006-01-02      515          0      64  0.135    8.627
13  2005-12-06      542          0      70  0.118    8.246
14  2005-11-29      549          0      70  0.114    7.963
15  2005-11-22      556          0      -1  0.110   -0.110
16  2005-11-01      577          0      -1  0.099   -0.099
17  2005-10-20      589          0      -1  0.093   -0.093
18  2005-09-27      612          0      -1  0.083   -0.083
19  2005-09-07      632          0      -1  0.075   -0.075
20  2005-06-12      719          0      69  0.049    3.360
21  2005-05-29      733          0      -1  0.045   -0.045
22  2005-05-02      760          0      -1  0.040   -0.040
23  2005-04-02      790          0      -1  0.034   -0.034
24  2005-03-13      810          0      -1  0.031   -0.031
25  2004-11-09      934          0      -1  0.017   -0.017

In [57]: df[df.line_race != 0]
Out[57]:
     line_date  daysago  line_race  rating    raw  wrating
0   2007-03-31       62         11      56  1.000   56.000
1   2007-03-10       83         11      67  1.000   67.000
2   2007-02-10      111          9      66  1.000   66.000
3   2007-01-13      139         10      83  0.881   73.096
4   2006-12-23      160         10      88  0.793   69.787
5   2006-11-09      204          9      52  0.637   33.106
6   2006-10-22      222          8      66  0.582   38.408
7   2006-09-29      245          9      70  0.519   36.318
8   2006-09-16      258         11      68  0.486   33.063
9   2006-08-30      275          8      72  0.447   32.160
10  2006-02-11      475          5      65  0.165   10.698

UPDATE: Now that pandas 0.13 is out, another way to do this is df.query('line_race != 0').

Solution 6 - Python

In case of multiple values and str dtype

I used the following to filter out given values in a col:

def filter_rows_by_values(df, col, values):
    return df[~df[col].isin(values)]

Example:

In a DataFrame I want to remove rows which have values "b" and "c" in column "str"

df = pd.DataFrame({"str": ["a","a","a","a","b","b","c"], "other": [1,2,3,4,5,6,7]})
df
   str	other
0	a	1
1	a	2
2	a	3
3	a	4
4	b	5
5	b	6
6	c	7

filter_rows_by_values(df, "str", ["b","c"])

   str	other
0	a	1
1	a	2
2	a	3
3	a	4

Solution 7 - Python

Though the previous answer are almost similar to what I am going to do, but using the index method does not require using another indexing method .loc(). It can be done in a similar but precise manner as

df.drop(df.index[df['line_race'] == 0], inplace = True)

Solution 8 - Python

The given answer is correct nontheless as someone above said you can use df.query('line_race != 0') which depending on your problem is much faster. Highly recommend.

Solution 9 - Python

One of the efficient and pandaic way is using eq() method:

df[~df.line_race.eq(0)]

Solution 10 - Python

Another way of doing it. May not be the most efficient way as the code looks a bit more complex than the code mentioned in other answers, but still alternate way of doing the same thing.

  df = df.drop(df[df['line_race']==0].index)

Solution 11 - Python

Adding one more way to do this.

 df = df.query("line_race!=0")

Solution 12 - Python

I compiled and run my code. This is accurate code. You can try it your own.

data = pd.read_excel('file.xlsx')

If you have any special character or space in column name you can write it in '' like in the given code:

data = data[data['expire/t'].notnull()]
print (date)

If there is just a single string column name without any space or special character you can directly access it.

data = data[data.expire ! = 0]
print (date)

Solution 13 - Python

Just adding another way for DataFrame expanded over all columns:

for column in df.columns:
   df = df[df[column]!=0]

Example:

def z_score(data,count):
   threshold=3
   for column in data.columns:
	   mean = np.mean(data[column])
	   std = np.std(data[column])
	   for i in data[column]:
		   zscore = (i-mean)/std
		   if(np.abs(zscore)>threshold):
			   count=count+1
			   data = data[data[column]!=i]
   return data,count

Solution 14 - Python

Just in case you need to delete the row, but the value can be in different columns. In my case I was using percentages so I wanted to delete the rows which has a value 1 in any column, since that means that it's the 100%

for x in df:
    df.drop(df.loc[df[x]==1].index, inplace=True)

Is not optimal if your df have too many columns.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionTravisVOXView Question on Stackoverflow
Solution 1 - PythontshauckView Answer on Stackoverflow
Solution 2 - Pythonwonderkid2View Answer on Stackoverflow
Solution 3 - PythondesmondView Answer on Stackoverflow
Solution 4 - PythonRobvhView Answer on Stackoverflow
Solution 5 - PythonPhillip CloudView Answer on Stackoverflow
Solution 6 - PythonMo_OfficalView Answer on Stackoverflow
Solution 7 - PythonLoochieView Answer on Stackoverflow
Solution 8 - Pythonh3h325View Answer on Stackoverflow
Solution 9 - PythonashkanghView Answer on Stackoverflow
Solution 10 - PythonAmruth LakkavaramView Answer on Stackoverflow
Solution 11 - PythonTufail WarisView Answer on Stackoverflow
Solution 12 - PythonUzairView Answer on Stackoverflow
Solution 13 - PythonPrateek Kumar SinghView Answer on Stackoverflow
Solution 14 - Pythonjuan escorciaView Answer on Stackoverflow