Find empty or NaN entry in Pandas Dataframe

ListPython 2.7PandasIndexingDataframe

List Problem Overview


I am trying to search through a Pandas Dataframe to find where it has a missing entry or a NaN entry.

Here is a dataframe that I am working with:

cl_id       a           c         d         e        A1              A2             A3
    0       1   -0.419279  0.843832 -0.530827    text76        1.537177      -0.271042
    1       2    0.581566  2.257544  0.440485    dafN_6        0.144228       2.362259
    2       3   -1.259333  1.074986  1.834653    system                       1.100353
    3       4   -1.279785  0.272977  0.197011     Fifty       -0.031721       1.434273
    4       5    0.578348  0.595515  0.553483   channel        0.640708       0.649132
    5       6   -1.549588 -0.198588  0.373476     audio       -0.508501               
    6       7    0.172863  1.874987  1.405923    Twenty             NaN            NaN
    7       8   -0.149630 -0.502117  0.315323  file_max             NaN            NaN

NOTE: The blank entries are empty strings - this is because there was no alphanumeric content in the file that the dataframe came from.

If I have this dataframe, how can I find a list with the indexes where the NaN or blank entry occurs?

List Solutions


Solution 1 - List

np.where(pd.isnull(df)) returns the row and column indices where the value is NaN:

In [152]: import numpy as np
In [153]: import pandas as pd
In [154]: np.where(pd.isnull(df))
Out[154]: (array([2, 5, 6, 6, 7, 7]), array([7, 7, 6, 7, 6, 7]))

In [155]: df.iloc[2,7]
Out[155]: nan

In [160]: [df.iloc[i,j] for i,j in zip(*np.where(pd.isnull(df)))]
Out[160]: [nan, nan, nan, nan, nan, nan]

Finding values which are empty strings could be done with applymap:

In [182]: np.where(df.applymap(lambda x: x == ''))
Out[182]: (array([5]), array([7]))

Note that using applymap requires calling a Python function once for each cell of the DataFrame. That could be slow for a large DataFrame, so it would be better if you could arrange for all the blank cells to contain NaN instead so you could use pd.isnull.

Solution 2 - List

Try this:

df[df['column_name'] == ''].index

and for NaNs you can try:

pd.isna(df['column_name'])

Solution 3 - List

Check if the columns contain Nan using .isnull() and check for empty strings using .eq(''), then join the two together using the bitwise OR operator |.

Sum along axis 0 to find columns with missing data, then sum along axis 1 to the index locations for rows with missing data.

missing_cols, missing_rows = (
    (df2.isnull().sum(x) | df2.eq('').sum(x))
    .loc[lambda x: x.gt(0)].index
    for x in (0, 1)
)

>>> df2.loc[missing_rows, missing_cols]
         A2       A3
2            1.10035
5 -0.508501         
6       NaN      NaN
7       NaN      NaN

Solution 4 - List

I've resorted to

df[ (df[column_name].notnull()) & (df[column_name]!=u'') ].index

lately. That gets both null and empty-string cells in one go.

Solution 5 - List

In my opinion, don't waste time and just replace with NaN! Then, search all entries with Na. (This is correct because empty values are missing values anyway).

import numpy as np                             # to use np.nan 
import pandas as pd                            # to use replace
    
df = df.replace(' ', np.nan)                   # to get rid of empty values
nan_values = df[df.isna().any(axis=1)]         # to get all rows with Na

nan_values                                     # view df with NaN rows only

Solution 6 - List

Partial solution: for a single string column tmp = df['A1'].fillna(''); isEmpty = tmp=='' gives boolean Series of True where there are empty strings or NaN values.

Solution 7 - List

Another opltion covering cases where there might be severar spaces is by using the isspace() python function.

df[df.col_name.apply(lambda x:x.isspace() == False)] # will only return cases without empty spaces

adding NaN values:

df[(df.col_name.apply(lambda x:x.isspace() == False) & (~df.col_name.isna())] 

Solution 8 - List

you also do something good:

text_empty = df['column name'].str.len() > -1

df.loc[text_empty].index

The results will be the rows which are empty & it's index number.

Solution 9 - List

To obtain all the rows that contains an empty cell in in a particular column.

DF_new_row=DF_raw.loc[DF_raw['columnname']=='']

This will give the subset of DF_raw, which satisfy the checking condition.

Solution 10 - List

You can use string methods with regex to find cells with empty strings:

df[~df.column_name.str.contains('\w')].column_name.count()

Solution 11 - List

This is how to check if all values of the column are empty or NAN when df is your dataframe

    #Check if there is any empty columns
    df = df.replace(' ', np.nan)
    nan_cols = []
    for col in df.columns:
        x = pd.isna(df[col])
        x = x.to_numpy()
        if not False in x:
            nan_cols.append(col)
    print(nan_cols)

This code will return array of the cloumns which is empty or NAN values in all rows.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionedeszView Question on Stackoverflow
Solution 1 - ListunutbuView Answer on Stackoverflow
Solution 2 - ListVyachezView Answer on Stackoverflow
Solution 3 - ListAlexanderView Answer on Stackoverflow
Solution 4 - Listjeremy_rutmanView Answer on Stackoverflow
Solution 5 - ListZhannieView Answer on Stackoverflow
Solution 6 - ListlahoffmView Answer on Stackoverflow
Solution 7 - ListsaiasView Answer on Stackoverflow
Solution 8 - ListMohamed AbdelsalamView Answer on Stackoverflow
Solution 9 - ListSharaView Answer on Stackoverflow
Solution 10 - ListTodd SView Answer on Stackoverflow
Solution 11 - ListKapoNYCView Answer on Stackoverflow