How to set a cell to NaN in a pandas dataframe

Python Problem Overview

I'd like to replace bad values in a column of a dataframe by NaN's.

mydata = {'x' : [10, 50, 18, 32, 47, 20], 'y' : ['12', '11', 'N/A', '13', '15', 'N/A']}
df = pd.DataFrame(mydata)

df[df.y == 'N/A']['y'] = np.nan

Though, the last line fails and throws a warning because it's working on a copy of df. So, what's the correct way to handle this? I've seen many solutions with iloc or ix but here, I need to use a boolean condition.

Python Solutions

Solution 1 - Python

just use replace:

In [106]:
df.replace('N/A',np.NaN)

Out[106]:
    x    y
0  10   12
1  50   11
2  18  NaN
3  32   13
4  47   15
5  20  NaN

What you're trying is called chain indexing: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

You can use loc to ensure you operate on the original dF:

In [108]:
df.loc[df['y'] == 'N/A','y'] = np.nan
df

Out[108]:
    x    y
0  10   12
1  50   11
2  18  NaN
3  32   13
4  47   15
5  20  NaN

Solution 2 - Python

Most replies here above need to import an external module: import numpy as np

There is a built-in solution into pandas itself: pd.NA, to use like this:

df.replace('N/A', pd.NA)

Solution 3 - Python

While using replace seems to solve the problem, I would like to propose an alternative. Problem with mix of numeric and some string values in the column not to have strings replaced with np.nan, but to make whole column proper. I would bet that original column most likely is of an object type

Name: y, dtype: object

What you really need is to make it a numeric column (it will have proper type and would be quite faster), with all non-numeric values replaced by NaN.

Thus, good conversion code would be

pd.to_numeric(df['y'], errors='coerce')

Specify errors='coerce' to force strings that can't be parsed to a numeric value to become NaN. Column type would be

Name: y, dtype: float64

Solution 4 - Python

You can use replace:

df['y'] = df['y'].replace({'N/A': np.nan})

Also be aware of the inplace parameter for replace. You can do something like:

df.replace({'N/A': np.nan}, inplace=True)

This will replace all instances in the df without creating a copy.

Similarly, if you run into other types of unknown values such as empty string or None value:

df['y'] = df['y'].replace({'': np.nan})

df['y'] = df['y'].replace({None: np.nan})

Reference: Pandas Latest - Replace

Solution 5 - Python

As of pandas 1.0.0, you no longer need to use numpy to create null values in your dataframe. Instead you can just use pandas.NA (which is of type pandas._libs.missing.NAType), so it will be treated as null within the dataframe but will not be null outside dataframe context.

Solution 6 - Python

df.loc[df.y == 'N/A',['y']] = np.nan

This solve your problem. With the double [], you are working on a copy of the DataFrame. You have to specify exact location in one call to be able to modify it.

Solution 7 - Python

df.replace('columnvalue',np.NaN,inplace=True)

Solution 8 - Python

You can try these snippets.

In [16]:mydata = {'x' : [10, 50, 18, 32, 47, 20], 'y' : ['12', '11', 'N/A', '13', '15', 'N/A']}
In [17]:df=pd.DataFrame(mydata)
In [18]:df.y[df.y=="N/A"]=np.nan
Out[19]:df
x    y
0  10   12
1  50   11
2  18  NaN
3  32   13
4  47   15
5  20  NaN

Content Type	Original Author	Original Content on Stackoverflow
Question	Mark Morrisson	View Question on Stackoverflow
Solution 1 - Python	EdChum	View Answer on Stackoverflow
Solution 2 - Python	stallingOne	View Answer on Stackoverflow
Solution 3 - Python	Severin Pappadeux	View Answer on Stackoverflow
Solution 4 - Python	jmorrison	View Answer on Stackoverflow
Solution 5 - Python	slevin886	View Answer on Stackoverflow
Solution 6 - Python	jeremie benichou	View Answer on Stackoverflow
Solution 7 - Python	sameer_nubia	View Answer on Stackoverflow
Solution 8 - Python	rolandpeng	View Answer on Stackoverflow