Filtering all rows with NaT in a column in Dataframe python

PythonPandasDataframe

Python Problem Overview


I have a df like this:

    a b           c
    1 NaT         w
    2 2014-02-01  g
    3 NaT         x   

    df=df[df.b=='2014-02-01']

will give me

    a  b          c
    2 2014-02-01  g

I want a database of all rows with NaT in column b?

   df=df[df.b==None] #Doesn't work
   

I want this:

    a b           c
    1 NaT         w
    3 NaT         x    

Python Solutions


Solution 1 - Python

isnull and notnull work with NaT so you can handle them much the same way you handle NaNs:

>>> df

   a          b  c
0  1        NaT  w
1  2 2014-02-01  g
2  3        NaT  x

>>> df.dtypes

a             int64
b    datetime64[ns]
c            object

just use isnull to select:

df[df.b.isnull()]

   a   b  c
0  1 NaT  w
2  3 NaT  x

Solution 2 - Python

Using your example dataframe:

df = pd.DataFrame({"a":[1,2,3], 
                   "b":[pd.NaT, pd.to_datetime("2014-02-01"), pd.NaT], 
                   "c":["w", "g", "x"]})

Until v0.17 this didn't use to work:

df.query('b != b') 

and you had to do:

df.query('b == "NaT"') # yes, surprisingly, this works!

Since v0.17 though, both methods work, although I would only recommend the first one.

Solution 3 - Python

For those interested, in my case I wanted to drop the NaT contained in the DateTimeIndex of a dataframe. I could not directly use the notnull construction as suggested by Karl D. You first have to create a temporary column out of the index, then apply the mask, and then delete the temporary column again.

df["TMP"] = df.index.values                # index is a DateTimeIndex
df = df[df.TMP.notnull()]                  # remove all NaT values
df.drop(["TMP"], axis=1, inplace=True)     # delete TMP again

Solution 4 - Python

I feel that the comment by @DSM is worth a answer on its own, because this answers the fundamental question.

The misunderstanding comes from the assumption that pd.NaT acts like None. However, while None == None returns True, pd.NaT == pd.NaT returns False. Pandas NaT behaves like a floating-point NaN, which is not equal to itself.

As the previous answer explain, you should use

df[df.b.isnull()] # or notnull(), respectively

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionJase VillamView Question on Stackoverflow
Solution 1 - PythonKarl D.View Answer on Stackoverflow
Solution 2 - PythonRaduView Answer on Stackoverflow
Solution 3 - PythonEelco van VlietView Answer on Stackoverflow
Solution 4 - PythonMichael DornerView Answer on Stackoverflow