How to select rows with NaN in particular column?

PythonPandasDataframe

Python Problem Overview


Given this dataframe, how to select only those rows that have "Col2" equal to NaN?

df = pd.DataFrame([range(3), [0, np.NaN, 0], [0, 0, np.NaN], range(3), range(3)], columns=["Col1", "Col2", "Col3"])

which looks like:

   0   1   2
0  0   1   2
1  0 NaN   0
2  0   0 NaN
3  0   1   2
4  0   1   2

The result should be this one:

   0   1   2
1  0 NaN   0

Python Solutions


Solution 1 - Python

Try the following:

df[df['Col2'].isnull()]

Solution 2 - Python

@qbzenker provided the most idiomatic method IMO

Here are a few alternatives:

In [28]: df.query('Col2 != Col2') # Using the fact that: np.nan != np.nan
Out[28]:
   Col1  Col2  Col3
1     0   NaN   0.0

In [29]: df[np.isnan(df.Col2)]
Out[29]:
   Col1  Col2  Col3
1     0   NaN   0.0

Solution 3 - Python

If you want to select rows with at least one NaN value, then you could use isna + any on axis=1:

df[df.isna().any(axis=1)]

If you want to select rows with a certain number of NaN values, then you could use isna + sum on axis=1 + gt. For example, the following will fetch rows with at least 2 NaN values:

df[df.isna().sum(axis=1)>1]

If you want to limit the check to specific columns, you could select them first, then check:

df[df[['Col1', 'Col2']].isna().any(axis=1)]

If you want to select rows with all NaN values, you could use isna + all on axis=1:

df[df.isna().all(axis=1)]

If you want to select rows with no NaN values, you could notna + all on axis=1:

df[df.notna().all(axis=1)]

This is equivalent to:

df[df['Col1'].notna() & df['Col2'].notna() & df['Col3'].notna()]

which could become tedious if there are many columns. Instead, you could use functools.reduce to chain & operators:

import functools, operator
df[functools.reduce(operator.and_, (df[i].notna() for i in df.columns))]

or numpy.logical_and.reduce:

import numpy as np
df[np.logical_and.reduce([df[i].notna() for i in df.columns])]

If you're looking for filter the rows where there is no NaN in some column using query, you could do so by using engine='python' parameter:

df.query('Col2.notna()', engine='python')

or use the fact that NaN!=NaN like @MaxU - stop WAR against UA

df.query('Col2==Col2')

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionDinosauriusView Question on Stackoverflow
Solution 1 - PythonqbzenkerView Answer on Stackoverflow
Solution 2 - PythonMaxU - stop genocide of UAView Answer on Stackoverflow
Solution 3 - Pythonuser7864386View Answer on Stackoverflow