Aggregating by unique identifier and concatenating related values into a string

RConcatenationAggregate

R Problem Overview


I have a need that I imagine could be satisfied by aggregate or reshape, but I can't quite figure out.

I have a list of names (brand), and accompanying ID number (id). This data is in long form, so names can have multiple ID's. I'd like to de-dupicate by the name (brand) and concatenate the multiple possible id's into a string separated by a comment.

For example:

brand            id 
RadioShack	     2308
Rag & Bone	     4466
Ragu	         1830
Ragu	         4518
Ralph Lauren	 1638
Ralph Lauren	 2719
Ralph Lauren	 2720
Ralph Lauren	 2721
Ralph Lauren	 2722 

should become:

RadioShack	     2308
Rag & Bone	     4466
Ragu	         1830,4518
Ralph Lauren	 1638,2719,2720,2721,2722

How would I accomplish this?

R Solutions


Solution 1 - R

Let's call your data.frame DF

> aggregate(id ~ brand, data = DF, c)
         brand                           id
1   RadioShack                         2308
2   Rag & Bone                         4466
3         Ragu                   1830, 4518
4 Ralph Lauren 1638, 2719, 2720, 2721, 2722

Another alternative using aggregate is:

result <- aggregate(id ~ brand, data = DF, paste, collapse = ",")

This produces the same result and now id is not a list anymore. Thanks to @Frank comment. To see the class of each column try:

> sapply(result, class)
      brand          id 
   "factor" "character"

As mentioned by @DavidArenburg in the comments, another alternative is using the toString function:

aggregate(id ~ brand, data = DF, toString)

Solution 2 - R

A nice clean one line in data.table

library(data.table)
setDT(DF)
TWO OPTIONS:

results as a list

DF[ , .(id = list(id)), by = brand]
          brand                       id
1:   RadioShack                     2308
2:   Rag & Bone                     4466
3:         Ragu                1830,4518
4: Ralph Lauren 1638,2719,2720,2721,2722
> 

results as a string

DF[ , .(id = paste(id, collapse=",")), by = brand]
          brand                       id
1:   RadioShack                     2308
2:   Rag & Bone                     4466
3:         Ragu                1830,4518
4: Ralph Lauren 1638,2719,2720,2721,2722

Note

Even though the two results appear the same (that is when you print them, they look identical), they are in fact very different and allow for different functionality.

Namely, using the list option (the first one) allows you to then perform functions on the orignal ids.

The latter will allow you to display the information more easily (including exporting to CSV or excel), but to operate on the id's will require splicing them back.

Solution 3 - R

Or using [dplyr][1]:

library(dplyr)
DF %>%
  group_by(brand) %>%
  summarise(id = paste(id, collapse = ","))

Where DF is the name of your data.frame. [1]: http://cran.rstudio.com/web/packages/dplyr/vignettes/introduction.html

Solution 4 - R

Here's the information in base R:

myby <- by(df$id,df$brand,function(x)paste(x,collapse=","))

The formatting of "by" objects is weird. You can take data.frame(id=c(myby)) and the brands will become rownames:

#                                    id
# RadioShack                       2308
# Rag & Bone                       4466
# Ragu                        1830,4518
# Ralph Lauren 1638,2719,2720,2721,2722

Alternately, if you load the data.table package, this will work:

dt <- data.table(df)
dt[,paste(id,collapse=","),by=brand]
#           brand                       V1
# 1:   RadioShack                     2308
# 2:   Rag & Bone                     4466
# 3:         Ragu                1830,4518
# 4: Ralph Lauren 1638,2719,2720,2721,2722

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionroodyView Question on Stackoverflow
Solution 1 - RJilber UrbinaView Answer on Stackoverflow
Solution 2 - RRicardo SaportaView Answer on Stackoverflow
Solution 3 - RSam FirkeView Answer on Stackoverflow
Solution 4 - RFrankView Answer on Stackoverflow