data.frame Group By column

RAggregate

R Problem Overview


I have a data frame DF.

Say DF is:

  A B
1 1 2
2 1 3
3 2 3
4 3 5
5 3 6 

Now I want to combine together the rows by the column A and to have the sum of the column B.

For example:

  A B
1 1 5
2 2 3
3 3 11

I am doing this currently using an SQL query with the sqldf function. But for some reason it is very slow. Is there any more convenient way to do that? I could do it manually too using a for loop but it is again slow. My SQL query is " Select A,Count(B) from DF group by A".

In general whenever I don't use vectorized operations and I use for loops the performance is extremely slow even for single procedures.

R Solutions


Solution 1 - R

This is a common question. In base, the option you're looking for is aggregate. Assuming your data.frame is called "mydf", you can use the following.

> aggregate(B ~ A, mydf, sum)
  A  B
1 1  5
2 2  3
3 3 11

I would also recommend looking into the "data.table" package.

> library(data.table)
> DT <- data.table(mydf)
> DT[, sum(B), by = A]
   A V1
1: 1  5
2: 2  3
3: 3 11

Solution 2 - R

Using dplyr:

require(dplyr)    
df <- data.frame(A = c(1, 1, 2, 3, 3), B = c(2, 3, 3, 5, 6))
df %>% group_by(A) %>% summarise(B = sum(B))

## Source: local data frame [3 x 2]
## 
##   A  B
## 1 1  5
## 2 2  3
## 3 3 11

With sqldf:

library(sqldf)
sqldf('SELECT A, SUM(B) AS B FROM df GROUP BY A')

Solution 3 - R

I would recommend having a look at the plyr package. It might not be as fast as data.table or other packages, but it is quite instructive, especially when starting with R and having to do some data manipulation.

> DF <- data.frame(A = c("1", "1", "2", "3", "3"), B = c(2, 3, 3, 5, 6))
> library(plyr)
> DF.sum <- ddply(DF, c("A"), summarize, B = sum(B))
> DF.sum
  A  B
1 1  5
2 2  3
3 3 11

Solution 4 - R

require(reshape2)

T <- melt(df, id = c("A"))

T <- dcast(T, A ~ variable, sum)

I am not certain the exact advantages over aggregate.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionnikosdiView Question on Stackoverflow
Solution 1 - RA5C1D2H2I1M1N2O1R2T1View Answer on Stackoverflow
Solution 2 - RmpalancoView Answer on Stackoverflow
Solution 3 - Rr0bertView Answer on Stackoverflow
Solution 4 - RSocView Answer on Stackoverflow