Left join only selected columns in R with the merge() function

RMerge

R Problem Overview


I am trying to LEFT Join 2 data frames but I do not want join all the variables from the second data set:

As an example, I have dataset 1 (DF1):

  Cl    Q   Sales  Date
   A	2	30	   01/01/2014
   A	3	24	   02/01/2014
   A	1	10	   03/01/2014
   B	4	10	   01/01/2014
   B	1	20	   02/01/2014
   B	3	30	   03/01/2014

And I would like to left join dataset 2 (DF2):

Client	LO	CON
   A	12	CA
   B	11	US
   C	12	UK
   D	10	CA
   E	15	AUS
   F	91	DD

I am able to left join with the following code:

merge(x = DF1, y = DF2, by = "Client", all.x=TRUE) :

   Client Q    Sales   Date             LO      CON
   A      2    30      01/01/2014       12      CA
   A      3    24      02/01/2014       12      CA
   A      1    10      03/01/2014       12      CA
   B      4    10      01/01/2014       11      US
   B      1    20      02/01/2014       11      US
   B      3    30      03/01/2014       11      US

However, it merges both column LO and CON. I would only like to merge the column LO.

   Client Q    Sales   Date             LO      
   A      2    30      01/01/2014       12      
   A      3    24      02/01/2014       12      
   A      1    10      03/01/2014       12      
   B      4    10      01/01/2014       11      
   B      1    20      02/01/2014       11     
   B      3    30      03/01/2014       11      

R Solutions


Solution 1 - R

You can do this by subsetting the data you pass into your merge:

merge(x = DF1, y = DF2[ , c("Client", "LO")], by = "Client", all.x=TRUE)

Or you can simply delete the column after your current merge :)

Solution 2 - R

I think it's a little simpler to use the dplyr functions select and left_join ; at least it's easier for me to understand. The join function from dplyr are made to mimic sql arguments.

 library(tidyverse)
 
 DF2 <- DF2 %>%
   select(client, LO)

 joined_data <- left_join(DF1, DF2, by = "Client")

You don't actually need to use the "by" argument in this case because the columns have the same name.

Solution 3 - R

Nothing elegant but this could be another satisfactory answer.

merge(x = DF1, y = DF2, by = "Client", all.x=TRUE)[,c("Client","LO","CON")]

This will be useful especially when you don't need the keys that were used to join the tables in your results.

Solution 4 - R

Alternative solution using left_join() and select() from the dplyr package, without intermediate steps:

DF1 <- DF1 %>%
  left_join(DF2, by = "Client") %>%
  select(-CON)

Solution 5 - R

For Client column in both tables:

DF_joined <- DF1 %>% left_join(DF2 %>% select(Client,CON))

Solution 6 - R

One-liner with dplyr

DF_joined <- left_join(DF1, select(DF2, -CON), by = "Client")

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionSamer Nachab&#233;View Question on Stackoverflow
Solution 1 - RstanekamView Answer on Stackoverflow
Solution 2 - RBen GView Answer on Stackoverflow
Solution 3 - RAkshay KadidalView Answer on Stackoverflow
Solution 4 - Rrosie-betzlerView Answer on Stackoverflow
Solution 5 - RMabarView Answer on Stackoverflow
Solution 6 - RQuechuaView Answer on Stackoverflow