Count number of rows by group using dplyr

RDplyrCountPlyr

R Problem Overview


I am using the mtcars dataset. I want to find the number of records for a particular combination of data. Something very similar to the count(*) group by clause in SQL. ddply() from plyr is working for me

library(plyr)
ddply(mtcars, .(cyl,gear),nrow)

has output

  cyl gear V1
1   4    3  1
2   4    4  8
3   4    5  2
4   6    3  2
5   6    4  4
6   6    5  1
7   8    3 12
8   8    5  2

Using this code

library(dplyr)
g <- group_by(mtcars, cyl, gear)
summarise(g, length(gear))

has output

  length(cyl)
1          32

I found various functions to pass in to summarise() but none seem to work for me. One function I found is sum(G), which returned

Error in eval(expr, envir, enclos) : object 'G' not found

Tried using n(), which returned

Error in n() : This function should not be called directly

What am I doing wrong? How can I get group_by() / summarise() to work for me?

R Solutions


Solution 1 - R

There's a special function n() in dplyr to count rows (potentially within groups):

library(dplyr)
mtcars %>% 
  group_by(cyl, gear) %>% 
  summarise(n = n())
#Source: local data frame [8 x 3]
#Groups: cyl [?]
#
#    cyl  gear     n
#  (dbl) (dbl) (int)
#1     4     3     1
#2     4     4     8
#3     4     5     2
#4     6     3     2
#5     6     4     4
#6     6     5     1
#7     8     3    12
#8     8     5     2

But dplyr also offers a handy count function which does exactly the same with less typing:

count(mtcars, cyl, gear)          # or mtcars %>% count(cyl, gear)
#Source: local data frame [8 x 3]
#Groups: cyl [?]
#
#    cyl  gear     n
#  (dbl) (dbl) (int)
#1     4     3     1
#2     4     4     8
#3     4     5     2
#4     6     3     2
#5     6     4     4
#6     6     5     1
#7     8     3    12
#8     8     5     2

Solution 2 - R

another approach is to use the double colons:

mtcars %>% 
  dplyr::group_by(cyl, gear) %>%
  dplyr::summarise(length(gear))

Solution 3 - R

I think what you are looking for is as follows.

cars_by_cylinders_gears <- mtcars %>%
  group_by(cyl, gear) %>%
  summarise(count = n())

This is using the dplyr package. This is essentially the longhand version of the count () solution provided by docendo discimus.

Solution 4 - R

Another option, not necesarily more elegant, but does not require to refer to a specific column:

mtcars %>% 
  group_by(cyl, gear) %>%
  do(data.frame(nrow=nrow(.)))

This is equivalent to using count():

library(dplyr, warn.conflicts = FALSE)
all.equal(mtcars %>% 
            group_by(cyl, gear) %>%
            do(data.frame(n=nrow(.))) %>% 
            ungroup(),
          count(mtcars, cyl, gear), check.attributes=FALSE)
#> [1] TRUE

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestioncharmeeView Question on Stackoverflow
Solution 1 - RtalatView Answer on Stackoverflow
Solution 2 - Ruser3026255View Answer on Stackoverflow
Solution 3 - Rtb.View Answer on Stackoverflow
Solution 4 - RMatifouView Answer on Stackoverflow