Python Pandas update a dataframe value from another dataframe

PythonPandasDataframe

Python Problem Overview


I have two dataframes in python. I want to update rows in first dataframe using matching values from another dataframe. Second dataframe serves as an override.

Here is an example with same data and code:

DataFrame 1 :

enter image description here

DataFrame 2:

enter image description here

I want to update update dataframe 1 based on matching code and name. In this example Dataframe 1 should be updated as below:

enter image description here

Note : Row with Code =2 and Name= Company2 is updated with value 1000 (coming from Dataframe 2)

import pandas as pd

data1 = {
         'Code': [1, 2, 3],
         'Name': ['Company1', 'Company2', 'Company3'],
         'Value': [200, 300, 400],
         
    }
df1 = pd.DataFrame(data1, columns= ['Code','Name','Value'])

data2 = {
         'Code': [2],
         'Name': ['Company2'],
         'Value': [1000],
    }

df2 = pd.DataFrame(data2, columns= ['Code','Name','Value'])

Any pointers or hints?

Python Solutions


Solution 1 - Python

Using DataFrame.update, which aligns on indices (https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.update.html):

>>> df1.set_index('Code', inplace=True)
>>> df1.update(df2.set_index('Code'))
>>> df1.reset_index()  # to recover the initial structure

   Code      Name   Value
0     1  Company1   200.0
1     2  Company2  1000.0
2     3  Company3   400.0

Solution 2 - Python

You can using concat + drop_duplicates

pd.concat([df1,df2]).drop_duplicates(['Code','Name'],keep='last').sort_values('Code')
Out[1280]: 
   Code      Name  Value
0     1  Company1    200
0     2  Company2   1000
2     3  Company3    400

Update due to below comments

df1.set_index(['Code', 'Name'], inplace=True)

df1.update(df2.set_index(['Code', 'Name']))

df1.reset_index(drop=True, inplace=True)

Solution 3 - Python

You can merge the data first and then use numpy.where, here's how to use numpy.where

updated = df1.merge(df2, how='left', on=['Code', 'Name'], suffixes=('', '_new'))
updated['Value'] = np.where(pd.notnull(updated['Value_new']), updated['Value_new'], updated['Value'])
updated.drop('Value_new', axis=1, inplace=True)

   Code      Name   Value
0     1  Company1   200.0
1     2  Company2  1000.0
2     3  Company3   400.0

Solution 4 - Python

There is a update function available

example:

df1.update(df2)
for more info:

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.update.html

Solution 5 - Python

You can align indices and then use combine_first:

res = df2.set_index(['Code', 'Name'])\
         .combine_first(df1.set_index(['Code', 'Name']))\
         .reset_index()

print(res)

#    Code      Name   Value
# 0     1  Company1   200.0
# 1     2  Company2  1000.0
# 2     3  Company3   400.0

Solution 6 - Python

You can use pd.Series.where on the result of left-joining df1 and df2

merged = df1.merge(df2, on=['Code', 'Name'], how='left')
df1.Value = merged.Value_y.where(~merged.Value_y.isnull(), df1.Value)
>>> df1
    Code	Name	Value
0	1	Company1	200.0
1	2	Company2	1000.0
2	3	Company3	400.0

You can change the line to

df1.Value = merged.Value_y.where(~merged.Value_y.isnull(), df1.Value).astype(int)

in order to return the value to be an integer.

Solution 7 - Python

Assuming company and code are redundant identifiers, you can also do

import pandas as pd
vdic = pd.Series(df2.Value.values, index=df2.Name).to_dict()

df1.loc[df1.Name.isin(vdic.keys()), 'Value'] = df1.loc[df1.Name.isin(vdic.keys()), 'Name'].map(vdic)

#   Code      Name  Value
#0     1  Company1    200
#1     2  Company2   1000
#2     3  Company3    400

Solution 8 - Python

There's something I often do.

I merge 'left' first:

df_merged = pd.merge(df1, df2, how = 'left', on = 'Code')

Pandas will create columns with extension '_x' (for your left dataframe) and '_y' (for your right dataframe)

You want the ones that came from the right. So just remove any columns with '_x' and rename '_y':

for col in df_merged.columns:
    if '_x' in col:
        df_merged .drop(columns = col, inplace = True)
    if '_y' in col:
        new_name = col.strip('_y')
        df_merged .rename(columns = {col : new_name }, inplace=True)

Solution 9 - Python

  1. Append the dataset
  2. Drop the duplicate by code
  3. Sort the values
combined_df = combined_df.append(df2).drop_duplicates(['Code'],keep='last').sort_values('Code')

Solution 10 - Python

None of the above solutions worked for my particular example, which I think is rooted in the dtype of my columns, but I eventually came to this solution

indexes = df1.loc[df1.Code.isin(df2.Code.values)].index
df1.at[indexes,'Value'] = df2['Value'].values

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionProgSkyView Question on Stackoverflow
Solution 1 - PythonNicView Answer on Stackoverflow
Solution 2 - PythonBENYView Answer on Stackoverflow
Solution 3 - PythonBubble Bubble Bubble GutView Answer on Stackoverflow
Solution 4 - Pythonsafiqul islamView Answer on Stackoverflow
Solution 5 - PythonjppView Answer on Stackoverflow
Solution 6 - PythonAmi TavoryView Answer on Stackoverflow
Solution 7 - PythonALollzView Answer on Stackoverflow
Solution 8 - PythonCaio EstrellaView Answer on Stackoverflow
Solution 9 - PythonmuTheTechieView Answer on Stackoverflow
Solution 10 - Pythonarie64View Answer on Stackoverflow