pandas: merge (join) two data frames on multiple columns

PythonPython 3.xPandasJoin

Python Problem Overview


I am trying to join two pandas data frames using two columns:

new_df = pd.merge(A_df, B_df,  how='left', left_on='[A_c1,c2]', right_on = '[B_c1,c2]')

but got the following error:

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4164)()

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4028)()

pandas/src/hashtable_class_helper.pxi in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13166)()

pandas/src/hashtable_class_helper.pxi in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13120)()

KeyError: '[B_1, c2]'

Any idea what should be the right way to do this? Thanks!

Python Solutions


Solution 1 - Python

Try this

new_df = pd.merge(A_df, B_df,  how='left', left_on=['A_c1','c2'], right_on = ['B_c1','c2'])

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html

> left_on : label or list, or array-like Field names to join on in left > DataFrame. Can be a vector or list of vectors of the length of the > DataFrame to use a particular vector as the join key instead of > columns > > right_on : label or list, or array-like Field names to join on > in right DataFrame or vector/list of vectors per left_on docs

Solution 2 - Python

the problem here is that by using the apostrophes you are setting the value being passed to be a string, when in fact, as @Shijo stated from the documentation, the function is expecting a label or list, but not a string! If the list contains each of the name of the columns beings passed for both the left and right dataframe, then each column-name must individually be within apostrophes. With what has been stated, we can understand why this is inccorect:

new_df = pd.merge(A_df, B_df,  how='left', left_on='[A_c1,c2]', right_on = '[B_c1,c2]')

And this is the correct way of using the function:

new_df = pd.merge(A_df, B_df,  how='left', left_on=['A_c1','c2'], right_on = ['B_c1','c2'])

Solution 3 - Python

Another way of doing this:

new_df = A_df.merge(B_df, left_on=['A_c1','c2'], right_on = ['B_c1','c2'], how='left')

Solution 4 - Python

you can use below which is short and simple to understand:

merged_data= df1.merge(df2, on=["column1","column2"])

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionEdamameView Question on Stackoverflow
Solution 1 - PythonShijoView Answer on Stackoverflow
Solution 2 - PythonCelius StingherView Answer on Stackoverflow
Solution 3 - Pythonjohn edView Answer on Stackoverflow
Solution 4 - PythonAli karimiView Answer on Stackoverflow