join or merge with overwrite in pandas

PythonPandas

Python Problem Overview


I want to perform a join/merge/append operation on a dataframe with datetime index.

Let's say I have df1 and I want to add df2 to it. df2 can have fewer or more columns, and overlapping indexes. For all rows where the indexes match, if df2 has the same column as df1, I want the values of df1 be overwritten with those from df2.

How can I obtain the desired result?

Python Solutions


Solution 1 - Python

How about: df2.combine_first(df1)?

In [33]: df2
Out[33]: 
                   A         B         C         D
2000-01-03  0.638998  1.277361  0.193649  0.345063
2000-01-04 -0.816756 -1.711666 -1.155077 -0.678726
2000-01-05  0.435507 -0.025162 -1.112890  0.324111
2000-01-06 -0.210756 -1.027164  0.036664  0.884715
2000-01-07 -0.821631 -0.700394 -0.706505  1.193341
2000-01-10  1.015447 -0.909930  0.027548  0.258471
2000-01-11 -0.497239 -0.979071 -0.461560  0.447598

In [34]: df1
Out[34]: 
                   A         B         C
2000-01-03  2.288863  0.188175 -0.040928
2000-01-04  0.159107 -0.666861 -0.551628
2000-01-05 -0.356838 -0.231036 -1.211446
2000-01-06 -0.866475  1.113018 -0.001483
2000-01-07  0.303269  0.021034  0.471715
2000-01-10  1.149815  0.686696 -1.230991
2000-01-11 -1.296118 -0.172950 -0.603887
2000-01-12 -1.034574 -0.523238  0.626968
2000-01-13 -0.193280  1.857499 -0.046383
2000-01-14 -1.043492 -0.820525  0.868685

In [35]: df2.comb
df2.combine        df2.combineAdd     df2.combine_first  df2.combineMult    

In [35]: df2.combine_first(df1)
Out[35]: 
                   A         B         C         D
2000-01-03  0.638998  1.277361  0.193649  0.345063
2000-01-04 -0.816756 -1.711666 -1.155077 -0.678726
2000-01-05  0.435507 -0.025162 -1.112890  0.324111
2000-01-06 -0.210756 -1.027164  0.036664  0.884715
2000-01-07 -0.821631 -0.700394 -0.706505  1.193341
2000-01-10  1.015447 -0.909930  0.027548  0.258471
2000-01-11 -0.497239 -0.979071 -0.461560  0.447598
2000-01-12 -1.034574 -0.523238  0.626968       NaN
2000-01-13 -0.193280  1.857499 -0.046383       NaN
2000-01-14 -1.043492 -0.820525  0.868685       NaN

Note that it takes the values from df1 for indices that do not overlap with df2. If this doesn't do exactly what you want I would be willing to improve this function / add options to it.

Solution 2 - Python

For a merge like this, the update method of a DataFrame is useful.

Taking the examples from the documentation:

import pandas as pd
import numpy as np

df1 = pd.DataFrame([[np.nan, 3., 5.], [-4.6, 2.1, np.nan],
                   [np.nan, 7., np.nan]])
df2 = pd.DataFrame([[-42.6, np.nan, -8.2], [-5., 1.6, 4]],
                   index=[1, 2])

Data before the update:

>>> df1
     0    1    2
0  NaN  3.0  5.0
1 -4.6  2.1  NaN
2  NaN  7.0  NaN
>>>
>>> df2
      0    1    2
1 -42.6  NaN -8.2
2  -5.0  1.6  4.0

Let's update df1 with data from df2:

df1.update(df2)

Data after the update:

>>> df1
      0    1    2
0   NaN  3.0  5.0
1 -42.6  2.1 -8.2
2  -5.0  1.6  4.0

Remarks:

  • It's important to notice that this is an operation "in place", modifying the DataFrame that calls update.
  • Also note that non NaN values in df1 are not overwritten with NaN values in df2

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionsaroeleView Question on Stackoverflow
Solution 1 - PythonWes McKinneyView Answer on Stackoverflow
Solution 2 - PythonNicolás OzimicaView Answer on Stackoverflow