Querying for NaN and other names in Pandas

PythonPandas

Python Problem Overview


Say I have a dataframe df with a column value holding some float values and some NaN. How can I get the part of the dataframe where we have NaN using the query syntax?

The following, for example, does not work:

df.query( '(value < 10) or (value == NaN)' )

I get name NaN is not defined (same for df.query('value ==NaN'))

Generally speaking, is there any way to use numpy names in query, such as inf, nan, pi, e, etc.?

Python Solutions


Solution 1 - Python

In general, you could use @local_variable_name, so something like

>>> pi = np.pi; nan = np.nan
>>> df = pd.DataFrame({"value": [3,4,9,10,11,np.nan,12]})
>>> df.query("(value < 10) and (value > @pi)")
   value
1      4
2      9

would work, but nan isn't equal to itself, so value == NaN will always be false. One way to hack around this is to use that fact, and use value != value as an isnan check. We have

>>> df.query("(value < 10) or (value == @nan)")
   value
0      3
1      4
2      9

but

>>> df.query("(value < 10) or (value != value)")
   value
0      3
1      4
2      9
5    NaN

Solution 2 - Python

According to this answer you can use:

df.query('value < 10 | value.isnull()', engine='python')

I verified that it works.

Solution 3 - Python

For rows where value is not null

df.query("value == value")

For rows where value is null

df.query("value != value")

Solution 4 - Python

You can use the isna and notna Series methods, which is concise and readable.

import pandas as pd
import numpy as np

df = pd.DataFrame({'value': [3, 4, 9, 10, 11, np.nan, 12]})
available = df.query("value.notna()")
print(available)

#    value
# 0    3.0
# 1    4.0
# 2    9.0
# 3   10.0
# 4   11.0
# 6   12.0

not_available = df.query("value.isna()")
print(not_available)

#    value
# 5    NaN

In case you have numexpr installed, you need to pass engine="python" to make it work with .query. numexpr is recommended by pandas to speed up the performance of .query on larger datasets.

available = df.query("value.notna()", engine="python")
print(available)

Alternatively, you can use the toplevel pd.isna function, by referencing it as a local variable. Again, passing engine="python" is required when numexpr is present.

import pandas as pd
import numpy as np


df = pd.DataFrame({'value': [3, 4, 9, 10, 11, np.nan, 12]})
df.query("@pd.isna(value)")

#    value
# 5    NaN

Solution 5 - Python

Pandas fills empty cells in a DataFrame with NumPy's nan values. As it turns out, this has some funny properties. For one, nothing is equal to this kind of null, even itself. As a result, you can't search for it by checking for any particular equality.

In : 'nan' == np.nan
Out: False

In : None == np.nan
Out: False

In : np.nan == np.nan
Out: False

However, because a cell containing a np.nan value will not be equal to anything, including another np.nan value, we can check to see if it is unequal to itself.

In : np.nan != np.nan
Out: True

You can take advantage of this using Pandas query method by simply searching for cells where the value in a particular column is unequal to itself.

df.query('a != a')

or

df[df['a'] != df['a']]

Solution 6 - Python

This should also work: df.query("value == 'NaN'")

Solution 7 - Python

I think other answers will normally be better. In one case, my query had to go through eval (use eval very carefully) and the syntax below was useful. Requiring a number to be both less than and greater than or equal to excludes all numbers, leaving only null-like values.

df = pd.DataFrame({'value':[3,4,9,10,11,np.nan, 12]})

df.query("value < 10 or (~(value < 10) and ~(value >= 10))")

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionAmelio Vazquez-ReinaView Question on Stackoverflow
Solution 1 - PythonDSMView Answer on Stackoverflow
Solution 2 - PythonEric NessView Answer on Stackoverflow
Solution 3 - Pythonas - ifView Answer on Stackoverflow
Solution 4 - PythonJarnoView Answer on Stackoverflow
Solution 5 - PythonAreTooView Answer on Stackoverflow
Solution 6 - PythonfoadgrView Answer on Stackoverflow
Solution 7 - PythonJames PageView Answer on Stackoverflow