Adding two pandas dataframes

Python Problem Overview

I have two dataframes, both indexed by timeseries. I need to add the elements together to form a new dataframe, but only if the index and column are the same. If the item does not exist in one of the dataframes then it should be treated as a zero.

I've tried using .add but this sums regardless of index and column. Also tried a simple combined_data = dataframe1 + dataframe2 but this give a NaN if both dataframes don't have the element.

Any suggestions?

Python Solutions

Solution 1 - Python

How about x.add(y, fill_value=0)?

import pandas as pd

df1 = pd.DataFrame([(1,2),(3,4),(5,6)], columns=['a','b'])
Out: 
   a  b
0  1  2
1  3  4
2  5  6

df2 = pd.DataFrame([(100,200),(300,400),(500,600)], columns=['a','b'])
Out: 
     a    b
0  100  200
1  300  400
2  500  600

df_add = df1.add(df2, fill_value=0)
Out: 
     a    b
0  101  202
1  303  404
2  505  606

Solution 2 - Python

If I understand you correctly, you want something like:

(x.reindex_like(y).fillna(0) + y.fillna(0).fillna(0))

This will give the sum of the two dataframes. If a value is in one dataframe and not the other, the result at that position will be that existing value (look at B0 in X and B0 in Y and look at final output). If a value is missing in both dataframes, the result at that position will be zero (look at B1 in X and B1 in Y and look at final output).

>>> x
   A   B   C
0  1   2 NaN
1  3 NaN   4
>>> y
    A   B   C
0   8 NaN  88
1   2 NaN   5
2  10  11  12
>>> (x.reindex_like(y).fillna(0) + y.fillna(0).fillna(0))
    A   B   C
0   9   2  88
1   5   0   9
2  10  11  12

Solution 3 - Python

For making more general the answer... first I will take the common index for synchronizing both dataframes, then I will join each of them to my pattern (dates) and I will sum the columns of the same name and finally join both dataframes (deleting added columns in one of them),

you can see an example (with google's stock prices taken from google) here:

import numpy as np
import pandas as pd
import datetime as dt

prices = pd.DataFrame([[553.0, 555.5, 549.3, 554.11, 0],
                       [556.8, 556.8, 544.05, 545.92, 545.92],
                       [545.5, 546.89, 540.97, 542.04, 542.04]],
                       index=[dt.datetime(2014,11,04), dt.datetime(2014,11,05), dt.datetime(2014,11,06)],
                       columns=['Open', 'High', 'Low', 'Close', 'Adj Close'])

corrections = pd.DataFrame([[0, 555.22], [1238900, 0]],
					index=[dt.datetime(2014,11,3), dt.datetime(2014,11,4)],
					columns=['Volume', 'Adj Close'])
     
dates = pd.DataFrame(prices.index, columns = ['Dates']).append(pd.DataFrame(corrections.index, columns = ['Dates'])).drop_duplicates('Dates').set_index('Dates').sort(axis=0)
df_corrections = dates.join(corrections).fillna(0)
df_prices = dates.join(prices).fillna(0)

for col in prices.columns:
    if col in corrections.columns:
        df_prices[col]+=df_corrections[col]
        del df_corrections[col]
        
df_prices = df_prices.join(df_corrections)

Solution 4 - Python

Both the above answers - fillna(0) and a direct addition would give you Nan values if either of them have different structures.

Its Better to use fill_value

df.add(other_df, fill_value=0)

Content Type	Original Author	Original Content on Stackoverflow
Question	cs0679	View Question on Stackoverflow
Solution 1 - Python	Wes McKinney	View Answer on Stackoverflow
Solution 2 - Python	BrenBarn	View Answer on Stackoverflow
Solution 3 - Python	Xavi	View Answer on Stackoverflow
Solution 4 - Python	Prafulla Pallal	View Answer on Stackoverflow