Adding two pandas dataframes

PythonPandas

Python Problem Overview


I have two dataframes, both indexed by timeseries. I need to add the elements together to form a new dataframe, but only if the index and column are the same. If the item does not exist in one of the dataframes then it should be treated as a zero.

I've tried using .add but this sums regardless of index and column. Also tried a simple combined_data = dataframe1 + dataframe2 but this give a NaN if both dataframes don't have the element.

Any suggestions?

Python Solutions


Solution 1 - Python

How about x.add(y, fill_value=0)?

import pandas as pd

df1 = pd.DataFrame([(1,2),(3,4),(5,6)], columns=['a','b'])
Out: 
   a  b
0  1  2
1  3  4
2  5  6

df2 = pd.DataFrame([(100,200),(300,400),(500,600)], columns=['a','b'])
Out: 
     a    b
0  100  200
1  300  400
2  500  600

df_add = df1.add(df2, fill_value=0)
Out: 
     a    b
0  101  202
1  303  404
2  505  606

Solution 2 - Python

If I understand you correctly, you want something like:

(x.reindex_like(y).fillna(0) + y.fillna(0).fillna(0))

This will give the sum of the two dataframes. If a value is in one dataframe and not the other, the result at that position will be that existing value (look at B0 in X and B0 in Y and look at final output). If a value is missing in both dataframes, the result at that position will be zero (look at B1 in X and B1 in Y and look at final output).

>>> x
   A   B   C
0  1   2 NaN
1  3 NaN   4
>>> y
    A   B   C
0   8 NaN  88
1   2 NaN   5
2  10  11  12
>>> (x.reindex_like(y).fillna(0) + y.fillna(0).fillna(0))
    A   B   C
0   9   2  88
1   5   0   9
2  10  11  12

Solution 3 - Python

For making more general the answer... first I will take the common index for synchronizing both dataframes, then I will join each of them to my pattern (dates) and I will sum the columns of the same name and finally join both dataframes (deleting added columns in one of them),

you can see an example (with google's stock prices taken from google) here:

import numpy as np
import pandas as pd
import datetime as dt

prices = pd.DataFrame([[553.0, 555.5, 549.3, 554.11, 0],
                       [556.8, 556.8, 544.05, 545.92, 545.92],
                       [545.5, 546.89, 540.97, 542.04, 542.04]],
                       index=[dt.datetime(2014,11,04), dt.datetime(2014,11,05), dt.datetime(2014,11,06)],
                       columns=['Open', 'High', 'Low', 'Close', 'Adj Close'])

corrections = pd.DataFrame([[0, 555.22], [1238900, 0]],
					index=[dt.datetime(2014,11,3), dt.datetime(2014,11,4)],
					columns=['Volume', 'Adj Close'])
     
dates = pd.DataFrame(prices.index, columns = ['Dates']).append(pd.DataFrame(corrections.index, columns = ['Dates'])).drop_duplicates('Dates').set_index('Dates').sort(axis=0)
df_corrections = dates.join(corrections).fillna(0)
df_prices = dates.join(prices).fillna(0)

for col in prices.columns:
    if col in corrections.columns:
        df_prices[col]+=df_corrections[col]
        del df_corrections[col]
        
df_prices = df_prices.join(df_corrections)

Solution 4 - Python

Both the above answers - fillna(0) and a direct addition would give you Nan values if either of them have different structures.

Its Better to use fill_value

df.add(other_df, fill_value=0)

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
Questioncs0679View Question on Stackoverflow
Solution 1 - PythonWes McKinneyView Answer on Stackoverflow
Solution 2 - PythonBrenBarnView Answer on Stackoverflow
Solution 3 - PythonXaviView Answer on Stackoverflow
Solution 4 - PythonPrafulla PallalView Answer on Stackoverflow