Fillna in multiple columns in place in Python Pandas

PythonPandasDataframe

Python Problem Overview


I have a pandas dataFrame of mixed types, some are strings and some are numbers. I would like to replace the NAN values in string columns by '.', and the NAN values in float columns by 0.

Consider this small fictitious example:

df = pd.DataFrame({'Name':['Jack','Sue',pd.np.nan,'Bob','Alice','John'],
    'A': [1, 2.1, pd.np.nan, 4.7, 5.6, 6.8],
    'B': [.25, pd.np.nan, pd.np.nan, 4, 12.2, 14.4],
    'City':['Seattle','SF','LA','OC',pd.np.nan,pd.np.nan]})

Now, I can do it in 3 lines:

df['Name'].fillna('.',inplace=True)
df['City'].fillna('.',inplace=True)
df.fillna(0,inplace=True)

Since this is a small dataframe, 3 lines is probably ok. In my real example (which I cannot share here due to data confidentiality reasons), I have many more string columns and numeric columns. SO I end up writing many lines just for fillna. Is there a concise way of doing this?

Python Solutions


Solution 1 - Python

Came across this page while looking for an answer to this problem, but didn't like the existing answers. I ended up finding something better in the DataFrame.fillna documentation, and figured I'd contribute for anyone else that happens upon this.

If you have multiple columns, but only want to replace the NaN in a subset of them, you can use:

df.fillna({'Name':'.', 'City':'.'}, inplace=True)

This also allows you to specify different replacements for each column. And if you want to go ahead and fill all remaining NaN values, you can just throw another fillna on the end:

df.fillna({'Name':'.', 'City':'.'}, inplace=True).fillna(0, inplace=True)

Edit (22 Apr 2021)

Functionality (presumably / apparently) changed since original post, and you can no longer chain 2 inplace fillna() operations. You can still chain, but now must assign that chain to the df instead of modifying in place, e.g. like so:

df = df.fillna({'Name':'.', 'City':'.'}).fillna(0)

Solution 2 - Python

You could use apply for your columns with checking dtype whether it's numeric or not by checking dtype.kind:

res = df.apply(lambda x: x.fillna(0) if x.dtype.kind in 'biufc' else x.fillna('.'))

print(res)
     A      B     City   Name
0  1.0   0.25  Seattle   Jack
1  2.1   0.00       SF    Sue
2  0.0   0.00       LA      .
3  4.7   4.00       OC    Bob
4  5.6  12.20        .  Alice
5  6.8  14.40        .   John

Solution 3 - Python

You can either list the string columns by hand or glean them from df.dtypes. Once you have the list of string/object columns, you can call fillna on all those columns at once.

# str_cols = ['Name','City']
str_cols = df.columns[df.dtypes==object]
df[str_cols] = df[str_cols].fillna('.')
df = df.fillna(0)

Solution 4 - Python

define a function:

def myfillna(series):
    if series.dtype is pd.np.dtype(float):
        return series.fillna(0)
    elif series.dtype is pd.np.dtype(object):
        return series.fillna('.')
    else:
        return series

you can add other elif statements if you want to fill a column of a different dtype in some other way. Now apply this function over all columns of the dataframe

df = df.apply(myfillna)

this is the same as 'inplace'

Solution 5 - Python

There is a simpler way, that can be done in one line:

df.fillna({'Name':0,'City':0},inplace=True)

Not an awesome improvement but if you multiply it by 100, writting only the column names + ':0' is way faster than copying and pasting everything 100 times.

Solution 6 - Python

If you want to replace a list of columns ("lst") with the same value ("v")

def nan_to_zero(df, lst, v):
    d = {x:v for x in lst}
    df.fillna(d, inplace=True)
    return df

Solution 7 - Python

If you don't want to specify individual per-column replacement values, you can do it this way:

df[['Name', 'City']].fillna('.',inplace=True)

If you don't like inplace (like me) you can do it like this:

columns = ['Name', 'City']
df[columns] = df[columns].fillna('.')

Solution 8 - Python

The most concise and readable way to accomplish this, especially with many columns is to use df.select_dtypes.columns. (df.select_dtypes, df.columns)

df.select_dtypes returns a new df containing only the columns that match the dtype you need.

df.columns returns a list of the column names in your df.

Full code:

float_column_names = df.select_dtypes(float).columns
df[float_column_names] = df[float_column_names].fillna(0)

string_column_names = df.select_dtypes(str).columns
df[string_column_names] df[string_column_names].fillna('.')

Solution 9 - Python

Much easy way is :dt.replace(pd.np.nan, "NA"). In case you want other replacement, you should use the next:dt.replace("pattern", "replaced by (new pattern)")

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionozzyView Question on Stackoverflow
Solution 1 - PythonRob BulmahnView Answer on Stackoverflow
Solution 2 - PythonAnton ProtopopovView Answer on Stackoverflow
Solution 3 - PythonBob BaxleyView Answer on Stackoverflow
Solution 4 - PythonlatorrefabianView Answer on Stackoverflow
Solution 5 - PythonVinicius RaphaelView Answer on Stackoverflow
Solution 6 - PythonTomView Answer on Stackoverflow
Solution 7 - PythonDevyzrView Answer on Stackoverflow
Solution 8 - PythonBerel LevyView Answer on Stackoverflow
Solution 9 - PythonA. chahidView Answer on Stackoverflow