Merge two data frames based on common column values in Pandas

PandasDataframe

Pandas Problem Overview


How to get merged data frame from two data frames having common column value such that only those rows make merged data frame having common value in a particular column.

I have 5000 rows of df1 as format : -

	director_name	actor_1_name	actor_2_name	actor_3_name	movie_title
0	James Cameron	CCH Pounder	Joel David Moore	Wes Studi	  Avatar
1	Gore Verbinski	Johnny Depp	Orlando Bloom	Jack Davenport	 Pirates 
    of the Caribbean: At World's End
2	Sam Mendes	 Christoph Waltz	Rory Kinnear	Stephanie Sigman Spectre

and 10000 rows of df2 as

movieId		              genres	                    movie_title
	1		Adventure|Animation|Children|Comedy|Fantasy	  Toy Story
	2		Adventure|Children|Fantasy	                  Jumanji
	3		Comedy|Romance	                           Grumpier Old Men
	4		Comedy|Drama|Romance	                  Waiting to Exhale
	

A common column 'movie_title' have common values and based on them, I want to get all rows where 'movie_title' is same. Other rows to be deleted.

Any help/suggestion would be appreciated.

Note: I already tried

pd.merge(dfinal, df1, on='movie_title')

and output comes like one row

director_name	actor_1_name	actor_2_name	actor_3_name	movie_title	movieId	title	genres

and on how ="outer"/"left", "right", I tried all and didn't get any row after dropping NaN although many common coloumn do exist.

Pandas Solutions


Solution 1 - Pandas

You can use pd.merge:

import pandas as pd
pd.merge(df1, df2, on="movie_title")

Only rows are kept for which common keys are found in both data frames. In case you want to keep all rows from the left data frame and only add values from df2 where a matching key is available, you can use how="left":

pd.merge(df1, df2, on="movie_title", how="left")

Solution 2 - Pandas

We can merge two Data frames in several ways. Most common way in python is using merge operation in Pandas.

import pandas
dfinal = df1.merge(df2, on="movie_title", how = 'inner')

For merging based on columns of different dataframe, you may specify left and right common column names specially in case of ambiguity of two different names of same column, lets say - 'movie_title' as 'movie_name'.

dfinal = df1.merge(df2, how='inner', left_on='movie_title', right_on='movie_name')

If you want to be even more specific, you may read the documentation of pandas merge operation.

Solution 3 - Pandas

If you want to merge two DataFrames and you want a merged DataFrame in which only common values from both data frames will appear then do inner merge.

import pandas as pd

merged_Frame = pd.merge(df1, df2, on = id, how='inner')

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionHari_pbView Question on Stackoverflow
Solution 1 - PandaspansenView Answer on Stackoverflow
Solution 2 - PandasHari_pbView Answer on Stackoverflow
Solution 3 - Pandasgaurav singhView Answer on Stackoverflow