Python Pandas merge only certain columns

PythonMergePandas

Python Problem Overview


Is it possible to only merge some columns? I have a DataFrame df1 with columns x, y, z, and df2 with columns x, a ,b, c, d, e, f, etc.

I want to merge the two DataFrames on x, but I only want to merge columns df2.a, df2.b - not the entire DataFrame.

The result would be a DataFrame with x, y, z, a, b.

I could merge then delete the unwanted columns, but it seems like there is a better method.

Python Solutions


Solution 1 - Python

You want to use TWO brackets, so if you are doing a VLOOKUP sort of action:

df = pd.merge(df,df2[['Key_Column','Target_Column']],on='Key_Column', how='left')

This will give you everything in the original df + add that one corresponding column in df2 that you want to join.

Solution 2 - Python

You could merge the sub-DataFrame (with just those columns):

df2[list('xab')]  # df2 but only with columns x, a, and b

df1.merge(df2[list('xab')])

Solution 3 - Python

If you want to drop column(s) from the target data frame, but the column(s) are required for the join, you can do the following:

df1 = df1.merge(df2[['a', 'b', 'key1']], how = 'left',
                left_on = 'key2', right_on = 'key1').drop(columns = ['key1'])

The .drop(columns = 'key1') part will prevent 'key1' from being kept in the resulting data frame, despite it being required to join in the first place.

Solution 4 - Python

You can use .loc to select the specific columns with all rows and then pull that. An example is below:

pandas.merge(dataframe1, dataframe2.iloc[:, [0:5]], how='left', on='key')

In this example, you are merging dataframe1 and dataframe2. You have chosen to do an outer left join on 'key'. However, for dataframe2 you have specified .iloc which allows you to specific the rows and columns you want in a numerical format. Using :, your selecting all rows, but [0:5] selects the first 5 columns. You could use .loc to specify by name, but if your dealing with long column names, then .iloc may be better.

Solution 5 - Python

This is to merge selected columns from two tables.

If table_1 contains t1_a,t1_b,t1_c..,id,..t1_z columns, and table_2 contains t2_a, t2_b, t2_c..., id,..t2_z columns, and only t1_a, id, t2_a are required in the final table, then

mergedCSV = table_1[['t1_a','id']].merge(table_2[['t2_a','id']], on = 'id',how = 'left')
# save resulting output file    
mergedCSV.to_csv('output.csv',index = False)

Solution 6 - Python

Slight extension of the accepted answer for multi-character column names, using inner join by default:

df1 = df1.merge(df2[["Key_Column", "Target_Column1", "Target_Column2"]])

This assumes that Key_Column is the only column both dataframes have in common.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionBubbleGuppiesView Question on Stackoverflow
Solution 1 - PythonArthur D. HowlandView Answer on Stackoverflow
Solution 2 - PythonAndy HaydenView Answer on Stackoverflow
Solution 3 - PythontonneofashView Answer on Stackoverflow
Solution 4 - PythonTerrance DeJesusView Answer on Stackoverflow
Solution 5 - PythonMarco167View Answer on Stackoverflow
Solution 6 - PythonCornelius RoemerView Answer on Stackoverflow