Unique on a dataframe with only selected columns

RUnique

R Problem Overview


I have a dataframe with >100 columns, and I would to find the unique rows by comparing only two of the columns. I'm hoping this is an easy one, but I can't get it to work with unique or duplicated myself.

In the below, I would like to unique only using id and id2:

data.frame(id=c(1,1,3),id2=c(1,1,4),somevalue=c("x","y","z"))

id id2 somevalue
1   1         x
1   1         y
3   4         z

I would like to obtain either:

id id2 somevalue
1   1         x
3   4         z

or:

id id2 somevalue
1   1         y
3   4         z

(I have no preference which of the unique rows is kept)

R Solutions


Solution 1 - R

Ok, if it doesn't matter which value in the non-duplicated column you select, this should be pretty easy:

dat <- data.frame(id=c(1,1,3),id2=c(1,1,4),somevalue=c("x","y","z"))
> dat[!duplicated(dat[,c('id','id2')]),]
  id id2 somevalue
1  1   1         x
3  3   4         z

Inside the duplicated call, I'm simply passing only those columns from dat that I don't want duplicates of. This code will automatically always select the first of any ambiguous values. (In this case, x.)

Solution 2 - R

Here are a couple dplyr options that keep non-duplicate rows based on columns id and id2:

library(dplyr)                                        
df %>% distinct(id, id2, .keep_all = TRUE)
df %>% group_by(id, id2) %>% filter(row_number() == 1)
df %>% group_by(id, id2) %>% slice(1)

Solution 3 - R

Using unique():

dat <- data.frame(id=c(1,1,3),id2=c(1,1,4),somevalue=c("x","y","z"))    
dat[row.names(unique(dat[,c("id", "id2")])),]

Solution 4 - R

Minor update in @Joran's code.
Using the code below, you can avoid the ambiguity and only get the unique of two columns:

dat <- data.frame(id=c(1,1,3), id2=c(1,1,4) ,somevalue=c("x","y","z"))    
dat[row.names(unique(dat[,c("id", "id2")])), c("id", "id2")]

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionInaView Question on Stackoverflow
Solution 1 - RjoranView Answer on Stackoverflow
Solution 2 - RsbhaView Answer on Stackoverflow
Solution 3 - RGary FengView Answer on Stackoverflow
Solution 4 - RVaya AshishView Answer on Stackoverflow