Drop rows with all zeros in pandas data frame
PythonPandasPython Problem Overview
I can use pandas
dropna()
functionality to remove rows with some or all columns set as NA
's. Is there an equivalent function for dropping rows with all columns having value 0?
P kt b tt mky depth
1 0 0 0 0 0
2 0 0 0 0 0
3 0 0 0 0 0
4 0 0 0 0 0
5 1.1 3 4.5 2.3 9.0
In this example, we would like to drop the first 4 rows from the data frame.
thanks!
Python Solutions
Solution 1 - Python
One-liner. No transpose needed:
df.loc[~(df==0).all(axis=1)]
And for those who like symmetry, this also works...
df.loc[(df!=0).any(axis=1)]
Solution 2 - Python
It turns out this can be nicely expressed in a vectorized fashion:
> df = pd.DataFrame({'a':[0,0,1,1], 'b':[0,1,0,1]})
> df = df[(df.T != 0).any()]
> df
a b
1 0 1
2 1 0
3 1 1
Solution 3 - Python
I think this solution is the shortest :
df= df[df['ColName'] != 0]
Solution 4 - Python
I look up this question about once a month and always have to dig out the best answer from the comments:
df.loc[(df!=0).any(1)]
Thanks Dan Allan!
Solution 5 - Python
Replace the zeros with nan
and then drop the rows with all entries as nan
.
After that replace nan
with zeros.
import numpy as np
df = df.replace(0, np.nan)
df = df.dropna(how='all', axis=0)
df = df.replace(np.nan, 0)
Solution 6 - Python
Couple of solutions I found to be helpful while looking this up, especially for larger data sets:
df[(df.sum(axis=1) != 0)] # 30% faster
df[df.values.sum(axis=1) != 0] # 3X faster
Continuing with the example from @U2EF1:
In [88]: df = pd.DataFrame({'a':[0,0,1,1], 'b':[0,1,0,1]})
In [91]: %timeit df[(df.T != 0).any()]
1000 loops, best of 3: 686 µs per loop
In [92]: df[(df.sum(axis=1) != 0)]
Out[92]:
a b
1 0 1
2 1 0
3 1 1
In [95]: %timeit df[(df.sum(axis=1) != 0)]
1000 loops, best of 3: 495 µs per loop
In [96]: %timeit df[df.values.sum(axis=1) != 0]
1000 loops, best of 3: 217 µs per loop
On a larger dataset:
In [119]: bdf = pd.DataFrame(np.random.randint(0,2,size=(10000,4)))
In [120]: %timeit bdf[(bdf.T != 0).any()]
1000 loops, best of 3: 1.63 ms per loop
In [121]: %timeit bdf[(bdf.sum(axis=1) != 0)]
1000 loops, best of 3: 1.09 ms per loop
In [122]: %timeit bdf[bdf.values.sum(axis=1) != 0]
1000 loops, best of 3: 517 µs per loop
Solution 7 - Python
You can use a quick lambda
function to check if all the values in a given row are 0
. Then you can use the result of applying that lambda
as a way to choose only the rows that match or don't match that condition:
import pandas as pd
import numpy as np
np.random.seed(0)
df = pd.DataFrame(np.random.randn(5,3),
index=['one', 'two', 'three', 'four', 'five'],
columns=list('abc'))
df.loc[['one', 'three']] = 0
print df
print df.loc[~df.apply(lambda row: (row==0).all(), axis=1)]
Yields:
a b c
one 0.000000 0.000000 0.000000
two 2.240893 1.867558 -0.977278
three 0.000000 0.000000 0.000000
four 0.410599 0.144044 1.454274
five 0.761038 0.121675 0.443863
[5 rows x 3 columns]
a b c
two 2.240893 1.867558 -0.977278
four 0.410599 0.144044 1.454274
five 0.761038 0.121675 0.443863
[3 rows x 3 columns]
Solution 8 - Python
import pandas as pd
df = pd.DataFrame({'a' : [0,0,1], 'b' : [0,0,-1]})
temp = df.abs().sum(axis=1) == 0
df = df.drop(temp)
Result:
>>> df
a b
2 1 -1
Solution 9 - Python
Another alternative:
# Is there anything in this row non-zero?
# df != 0 --> which entries are non-zero? T/F
# (df != 0).any(axis=1) --> are there 'any' entries non-zero row-wise? T/F of rows that return true to this statement.
# df.loc[all_zero_mask,:] --> mask your rows to only show the rows which contained a non-zero entry.
# df.shape to confirm a subset.
all_zero_mask=(df != 0).any(axis=1) # Is there anything in this row non-zero?
df.loc[all_zero_mask,:].shape
Solution 10 - Python
this works for me
new_df = df[df.loc[:]!=0].dropna()
Solution 11 - Python
Following the example in the accepted answer, a more elegant solution:
df = pd.DataFrame({'a':[0,0,1,1], 'b':[0,1,0,1]})
df = df[df.any(axis=1)]
print(df)
a b
1 0 1
2 1 0
3 1 1
Solution 12 - Python
For me this code: df.loc[(df!=0).any(axis=0)]
did not work. It returned the exact dataset.
Instead, I used df.loc[:, (df!=0).any(axis=0)]
and dropped all the columns with 0 values in the dataset
The function .all()
droped all the columns in which are any zero values in my dataset.
Solution 13 - Python
df = df [~( df [ ['kt' 'b' 'tt' 'mky' 'depth', ] ] == 0).all(axis=1) ]
Try this command its perfectly working.
Solution 14 - Python
To drop all columns with values 0 in any row:
new_df = df[df.loc[:]!=0].dropna()