Count number of rows matching a criteria

R

R Problem Overview


I am looking for a command in R which is equivalent of this SQL statement. I want this to be a very simple basic solution without using complex functions OR dplyr type of packages.

Select count(*) as number_of_states 
  from myTable
where  sCode = "CA"

so essentially I would be counting number of rows matching my where condition.

I have imported a csv file into mydata as a data frame.So far I have tried these with no avail.

  1. nrow(mydata$sCode == "CA") ## ==>> returns NULL

  2. sum(mydata[mydata$sCode == 'CA',], na.rm=T) ## ==>> gives Error in FUN(X[[1L]], ...) : only defined on a data frame with all numeric variables

  3. sum(subset(mydata, sCode='CA', select=c(sCode)), na.rm=T) ## ==>> FUN(X[[1L]], ...) : only defined on a data frame with all numeric variables

  4. sum(mydata$sCode == "CA", na.rm=T) ## ==>> returns count of all rows in the entire data set, which is not the correct result.

and some variations of the above samples. Any help would be appreciated! Thanks.

R Solutions


Solution 1 - R

mydata$sCode == "CA" will return a boolean array, with a TRUE value everywhere that the condition is met. To illustrate:

> mydata = data.frame(sCode = c("CA", "CA", "AC"))
> mydata$sCode == "CA"
[1]  TRUE  TRUE FALSE

There are a couple of ways to deal with this:

  1. sum(mydata$sCode == "CA"), as suggested in the comments; because TRUE is interpreted as 1 and FALSE as 0, this should return the numer of TRUE values in your vector.

  2. length(which(mydata$sCode == "CA")); the which() function returns a vector of the indices where the condition is met, the length of which is the count of "CA".

Edit to expand upon what's happening in #2:

> which(mydata$sCode == "CA")
[1] 1 2

which() returns a vector identify each column where the condition is met (in this case, columns 1 and 2 of the dataframe). The length() of this vector is the number of occurences.

Solution 2 - R

sum is used to add elements; nrow is used to count the number of rows in a rectangular array (typically a matrix or data.frame); length is used to count the number of elements in a vector. You need to apply these functions correctly.

Let's assume your data is a data frame named "dat". Correct solutions:

nrow(dat[dat$sCode == "CA",])
length(dat$sCode[dat$sCode == "CA"])
sum(dat$sCode == "CA")

Solution 3 - R

  1. mydata$sCode is a vector, it's why nrow output is NULL.

  2. mydata[mydata$sCode == 'CA',] returns data.frame where sCode == 'CA'. sCode includes character. That's why sum gives you the error.

  3. subset(mydata, sCode='CA', select=c(sCode)), you should use sCode=='CA' instead sCode='CA'. Then subset returns you vector where sCode equals CA, so you should use

    length(subset(na.omit(mydata), sCode='CA', select=c(sCode)))

Or you can try this: sum(na.omit(mydata$sCode) == "CA")

Solution 4 - R

Just give a try using subset

nrow(subset(data,condition))

Example

nrow(subset(myData,sCode == "CA"))

Solution 5 - R

With dplyr package, Use

 nrow(filter(mydata, sCode == "CA")),

All the solutions provided here gave me same error as multi-sam but that one worked.

Solution 6 - R

to get the number of observations the number of rows from your Dataset would be more valid:

nrow(dat[dat$sCode == "CA",])

Solution 7 - R

grep command can be used

> CA = mydata[grep("CA", mydata$sCode, ] > > nrow(CA)

Solution 8 - R

Call nrow passing as argument the name of the dataset:

nrow(dataset)

Solution 9 - R

I'm using this short function to make it easier using dplyr:

countc <- function(.data, ..., preserve = FALSE){
   return(nrow(filter(.data, ..., .preserve = preserve)))
}

With this you can just use it like filter. For example:

countc(data, active == TRUE)
[1] 42

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
Questionmulti-samView Question on Stackoverflow
Solution 1 - RJoeView Answer on Stackoverflow
Solution 2 - RAlex WView Answer on Stackoverflow
Solution 3 - RFedorenko KristinaView Answer on Stackoverflow
Solution 4 - RGaurav ShivhareView Answer on Stackoverflow
Solution 5 - Ruser6755896View Answer on Stackoverflow
Solution 6 - RSami NavesiView Answer on Stackoverflow
Solution 7 - Rshabhari View Answer on Stackoverflow
Solution 8 - RTalha RasoolView Answer on Stackoverflow
Solution 9 - RjulianpoempView Answer on Stackoverflow