How to remove nan value while combining two column in Panda Data frame?

PythonPandas

Python Problem Overview


I am trying but not able to remove nan while combining two columns of a DataFrame.

Data is like:

feedback_id	                 _id
568a8c25cac4991645c287ac     nan	
568df45b177e30c6487d3603     nan	
nan                          568df434832b090048f34974   	
nan                          568cd22e9e82dfc166d7dff1	
568df3f0832b090048f34711	 nan
nan                          568e5a38b4a797c664143dda	

I want:

feedback_request_id
568a8c25cac4991645c287ac
568df45b177e30c6487d3603
568df434832b090048f34974
568cd22e9e82dfc166d7dff1
568df3f0832b090048f34711
568e5a38b4a797c664143dda

Here is my code:

df3['feedback_request_id'] = ('' if df3['_id'].empty else df3['_id'].map(str)) + ('' if df3['feedback_id'].empty else df3['feedback_id'].map(str))
		

Output I'm getting:

feedback_request_id
568a8c25cac4991645c287acnan
568df45b177e30c6487d3603nan
nan568df434832b090048f34974
nan568cd22e9e82dfc166d7dff1
568df3f0832b090048f34711nan
nan568e5a38b4a797c664143dda

I have tried this, also:

df3['feedback_request_id'] = ('' if df3['_id']=='nan' else df3['_id'].map(str)) + ('' if df3['feedback_id']=='nan' else df3['feedback_id'].map(str))

But it's giving the error:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Python Solutions


Solution 1 - Python

You can use combine_first or fillna:

print df['feedback_id'].combine_first(df['_id'])
0    568a8c25cac4991645c287ac
1    568df45b177e30c6487d3603
2    568df434832b090048f34974
3    568cd22e9e82dfc166d7dff1
4    568df3f0832b090048f34711
5    568e5a38b4a797c664143dda
Name: feedback_id, dtype: object

print df['feedback_id'].fillna(df['_id'])
0    568a8c25cac4991645c287ac
1    568df45b177e30c6487d3603
2    568df434832b090048f34974
3    568cd22e9e82dfc166d7dff1
4    568df3f0832b090048f34711
5    568e5a38b4a797c664143dda
Name: feedback_id, dtype: object

Solution 2 - Python

If you want a solution that doesn't require referencing df twice or any of its columns explicitly:

df.bfill(axis=1).iloc[:, 0]

With two columns, this will copy non-null values from the right column into the left, then select the left column.

Solution 3 - Python

For an in-place solution, you can use pd.Series.update with pd.DataFrame.pop:

df['feedback_id'].update(df.pop('_id'))

print(df)

                feedback_id
0  568a8c25cac4991645c287ac
1  568df45b177e30c6487d3603
2  568df434832b090048f34974
3  568cd22e9e82dfc166d7dff1
4  568df3f0832b090048f34711
5  568e5a38b4a797c664143dda

Solution 4 - Python

below should works, if not, check with the null in your columns are np.nan or pd.NaT, only pd.NaT will work

df[['col1','col2']].bfill(axis=1).iloc[:, 0]

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionimSonuGuptaView Question on Stackoverflow
Solution 1 - PythonjezraelView Answer on Stackoverflow
Solution 2 - PythonBallpointBenView Answer on Stackoverflow
Solution 3 - PythonjppView Answer on Stackoverflow
Solution 4 - PythonPyBossView Answer on Stackoverflow