How to combine multiple conditions to subset a data-frame using "OR"?

RConditionalDataframe

R Problem Overview


I have a data.frame in R. I want to try two different conditions on two different columns, but I want these conditions to be inclusive. Therefore, I would like to use "OR" to combine the conditions. I have used the following syntax before with lot of success when I wanted to use the "AND" condition.

my.data.frame <- data[(data$V1 > 2) & (data$V2 < 4), ]

But I don't know how to use an 'OR' in the above.

R Solutions


Solution 1 - R

my.data.frame <- subset(data , V1 > 2 | V2 < 4)

An alternative solution that mimics the behavior of this function and would be more appropriate for inclusion within a function body:

new.data <- data[ which( data$V1 > 2 | data$V2 < 4) , ]

Some people criticize the use of which as not needed, but it does prevent the NA values from throwing back unwanted results. The equivalent (.i.e not returning NA-rows for any NA's in V1 or V2) to the two options demonstrated above without the which would be:

 new.data <- data[ !is.na(data$V1 | data$V2) & ( data$V1 > 2 | data$V2 < 4)  , ]

Note: I want to thank the anonymous contributor that attempted to fix the error in the code immediately above, a fix that got rejected by the moderators. There was actually an additional error that I noticed when I was correcting the first one. The conditional clause that checks for NA values needs to be first if it is to be handled as I intended, since ...

> NA & 1
[1] NA
> 0 & NA
[1] FALSE

Order of arguments may matter when using '&".

Solution 2 - R

You are looking for "|." See http://cran.r-project.org/doc/manuals/R-intro.html#Logical-vectors

my.data.frame <- data[(data$V1 > 2) | (data$V2 < 4), ]

Solution 3 - R

Just for the sake of completeness, we can use the operators [ and [[:

set.seed(1)
df <- data.frame(v1 = runif(10), v2 = letters[1:10])

Several options

df[df[1] < 0.5 | df[2] == "g", ] 
df[df[[1]] < 0.5 | df[[2]] == "g", ] 
df[df["v1"] < 0.5 | df["v2"] == "g", ]

df$name is equivalent to df[["name", exact = FALSE]]

Using dplyr:

library(dplyr)
filter(df, v1 < 0.5 | v2 == "g")

Using sqldf:

library(sqldf)
sqldf('SELECT *
      FROM df 
      WHERE v1 < 0.5 OR v2 = "g"')

Output for the above options:

          v1 v2
1 0.26550866  a
2 0.37212390  b
3 0.20168193  e
4 0.94467527  g
5 0.06178627  j

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionSamView Question on Stackoverflow
Solution 1 - RIRTFMView Answer on Stackoverflow
Solution 2 - RncrayView Answer on Stackoverflow
Solution 3 - RmpalancoView Answer on Stackoverflow