Pandas: drop columns with all NaN's

PythonPandasDataframeIn Place

Python Problem Overview


I realize that dropping NaNs from a dataframe is as easy as df.dropna but for some reason that isn't working on mine and I'm not sure why.

Here is my original dataframe:

fish_frame1:                       0   1   2         3   4       5   6          7
0               #0915-8 NaN NaN       NaN NaN     NaN NaN        NaN
1                   NaN NaN NaN  LIVE WGT NaN  AMOUNT NaN      TOTAL
2               GBW COD NaN NaN     2,280 NaN   $0.60 NaN  $1,368.00
3               POLLOCK NaN NaN     1,611 NaN   $0.01 NaN     $16.11
4                 WHAKE NaN NaN       441 NaN   $0.70 NaN    $308.70
5           GBE HADDOCK NaN NaN     2,788 NaN   $0.01 NaN     $27.88
6           GBW HADDOCK NaN NaN    16,667 NaN   $0.01 NaN    $166.67
7               REDFISH NaN NaN       932 NaN   $0.01 NaN      $9.32
8    GB WINTER FLOUNDER NaN NaN       145 NaN   $0.25 NaN     $36.25
9   GOM WINTER FLOUNDER NaN NaN    25,070 NaN   $0.35 NaN  $8,774.50
10        GB YELLOWTAIL NaN NaN        26 NaN   $1.75 NaN     $45.50

The code that follows is an attempt to drop all NaNs as well as any columns with more than 3 NaNs (either one, or both, should work I think):

fish_frame.dropna()
fish_frame.dropna(thresh=len(fish_frame) - 3, axis=1)

This produces:

fish_frame1 after dropna:                       0   1   2         3   4       5   6          7
0               #0915-8 NaN NaN       NaN NaN     NaN NaN        NaN
1                   NaN NaN NaN  LIVE WGT NaN  AMOUNT NaN      TOTAL
2               GBW COD NaN NaN     2,280 NaN   $0.60 NaN  $1,368.00
3               POLLOCK NaN NaN     1,611 NaN   $0.01 NaN     $16.11
4                 WHAKE NaN NaN       441 NaN   $0.70 NaN    $308.70
5           GBE HADDOCK NaN NaN     2,788 NaN   $0.01 NaN     $27.88
6           GBW HADDOCK NaN NaN    16,667 NaN   $0.01 NaN    $166.67
7               REDFISH NaN NaN       932 NaN   $0.01 NaN      $9.32
8    GB WINTER FLOUNDER NaN NaN       145 NaN   $0.25 NaN     $36.25
9   GOM WINTER FLOUNDER NaN NaN    25,070 NaN   $0.35 NaN  $8,774.50
10        GB YELLOWTAIL NaN NaN        26 NaN   $1.75 NaN     $45.50

I'm a novice with Pandas so I'm not sure if this isn't working because I'm doing something wrong or I'm misunderstanding something or misusing a function. Any help is appreciated thanks.

Python Solutions


Solution 1 - Python

From the dropna docstring:

Drop the columns where all elements are NaN:
df.dropna(axis=1, how='all')


   A    B    D
0  NaN  2.0  0
1  3.0  4.0  1
2  NaN  NaN  5

Solution 2 - Python

dropna() drops the null values and returns a dataFrame. Assign it back to the original dataFrame.

fish_frame = fish_frame.dropna(axis = 1, how = 'all')

Referring to your code:

fish_frame.dropna(thresh=len(fish_frame) - 3, axis=1)

This would drop columns with 7 or more NaN's (assuming len(df) = 10), if you want to drop columns with more than 3 Nan's like you've mentioned, thresh should be equal to 3.

Solution 3 - Python

dropna() by default returns a dataframe (defaults to inplace=False behavior) and thus needs to be assigned to a new dataframe for it to stay in your code.

So for example,

fish_frame = fish_frame.dropna()

As to why your dropna is returning an empty dataframe, I'd recommend you look at the "how" argument in the dropna method (https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.dropna.html). Also bear in mind, axis=0 corresponds to columns, and axis=1 corresponds to rows.

So to remove columns with all "NAs", axis=0, how="any" should do the trick:

fish_frame = fish_frame.dropna(axis=0, how="any")

Finally, the "thresh" argument designates explicitly how many NA's are necessary for a drop to occur. So

fish_frame = fish_frame.dropna(axis=0, thresh=3, how="any") 

should work fine and dandy to remove any column with three NA's.

Also, as Corley pointed out, how="any" is the default and is thus not necessary.

Solution 4 - Python

Another solution would be to create a boolean dataframe with True values at not-null positions and then take the columns having at least one True value. Below line removes columns with all NaN values.

df = df.loc[:,df.notna().any(axis=0)]

If you want to remove columns having at least one missing (NaN) value;

df = df.loc[:,df.notna().all(axis=0)]

This approach is particularly useful in removing columns containing empty strings, zeros or basically any given value. For example;

df = df.loc[:,(df!='').all(axis=0)]

removes columns having at least one empty string.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestiontheprowlerView Question on Stackoverflow
Solution 1 - PythonCorley BrigmanView Answer on Stackoverflow
Solution 2 - PythonRakesh AdhikesavanView Answer on Stackoverflow
Solution 3 - PythonSeeDerekEngineerView Answer on Stackoverflow
Solution 4 - PythonAchintha IhalageView Answer on Stackoverflow