Python Pandas replace NaN in one column with value from corresponding row of second column

PythonPandasDataframeNanFillna

Python Problem Overview


I am working with this Pandas DataFrame in Python.

File    heat	Farheit	Temp_Rating
   1    YesQ	     75		    N/A
   1    NoR	        115		    N/A
   1    YesA	     63		    N/A
   1    NoT	         83		     41
   1    NoY	        100   		 80
   1    YesZ	     56		     12
   2    YesQ	    111		    N/A
   2    NoR    	     60	        N/A
   2    YesA	     19	        N/A
   2    NoT	        106	         77
   2    NoY    	     45     	 21
   2    YesZ	     40		     54
   3    YesQ	     84		    N/A
   3    NoR	         67		    N/A
   3    YesA	     94		    N/A
   3    NoT	         68		     39
   3    NoY    	     63		     46
   3    YesZ	     34		     81

I need to replace all NaNs in the Temp_Rating column with the value from the Farheit column.

This is what I need:

File        heat	Temp_Rating
   1        YesQ	         75
   1	     NoR            115
   1	    YesA	         63
   1	    YesQ	         41
   1	     NoR             80
   1	    YesA	         12
   2	    YesQ	        111
   2	     NoR             60
   2	    YesA	         19
   2	     NoT   	         77
   2	     NoY             21
   2	    YesZ	         54
   3	    YesQ	         84
   3    	 NoR             67
   3	    YesA	         94
   3	     NoT             39
   3	     NoY             46
   3	    YesZ	         81

If I do a Boolean selection, I can pick out only one of these columns at a time. The problem is if I then try to join them, I am not able to do this while preserving the correct order.

How can I only find Temp_Rating rows with the NaNs and replace them with the value in the same row of the Farheit column?

Python Solutions


Solution 1 - Python

Assuming your DataFrame is in df:

df.Temp_Rating.fillna(df.Farheit, inplace=True)
del df['Farheit']
df.columns = 'File heat Observations'.split()

First replace any NaN values with the corresponding value of df.Farheit. Delete the 'Farheit' column. Then rename the columns. Here's the resulting DataFrame:

resulting DataFrame

Solution 2 - Python

The above mentioned solutions did not work for me. The method I used was:

df.loc[df['foo'].isnull(),'foo'] = df['bar']

Solution 3 - Python

An other way to solve this problem,

import pandas as pd
import numpy as np

ts_df = pd.DataFrame([[1,"YesQ",75,],[1,"NoR",115,],[1,"NoT",63,13],[2,"YesT",43,71]],columns=['File','heat','Farheit','Temp'])


def fx(x):
    if np.isnan(x['Temp']):
        return x['Farheit']
    else:
        return x['Temp']
print(1,ts_df)
ts_df['Temp']=ts_df.apply(lambda x : fx(x),axis=1)

print(2,ts_df)

returns:

(1,    File  heat  Farheit  Temp                                                                                    
0     1  YesQ       75   NaN                                                                                        
1     1   NoR      115   NaN                                                                                        
2     1   NoT       63  13.0                                                                                        
3     2  YesT       43  71.0)                                                                                       
(2,    File  heat  Farheit   Temp                                                                                   
0     1  YesQ       75   75.0                                                                                       
1     1   NoR      115  115.0
2     1   NoT       63   13.0
3     2  YesT       43   71.0)

Solution 4 - Python

@Jonathan's answer is good, but an overkill, just use pop:

df['Temp_Rating'] = df['Temp_Rating'].fillna(df.pop('Farheit'))

Solution 5 - Python

The accepted answer uses fillna() which will fill in missing values where the two dataframes share indices. As explained nicely here, you can use combine_first to fill in missing values, rows and index values for situations where the indices of the two dataframes don't match.

df.Col1 = df.Col1.fillna(df.Col2) #fill in missing values if indices match

#or 
df.Col1 = df.Col1.combine_first(df.Col2) #fill in values, rows, and indices

Solution 6 - Python

You can also use mask which replaces the values where Temp_Rating is NaN by the column Farheit:

df['Temp_Rating'] = df['Temp_Rating'].mask(df['Temp_Rating'].isna(), df['Farheit'])

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionedeszView Question on Stackoverflow
Solution 1 - PythonJonathan EuniceView Answer on Stackoverflow
Solution 2 - Pythonzsad512View Answer on Stackoverflow
Solution 3 - Pythonfelix_asView Answer on Stackoverflow
Solution 4 - PythonU12-ForwardView Answer on Stackoverflow
Solution 5 - PythonJohnView Answer on Stackoverflow
Solution 6 - PythonrachwaView Answer on Stackoverflow