case_when in mutate pipe

RDplyr

R Problem Overview


It seems dplyr::case_when doesn't behave as other commands in a dplyr::mutate call. For instance:

library(dplyr)

case_when(mtcars$carb <= 2 ~ "low",
          mtcars$carb > 2 ~ "high") %>% 
  table

works:

.
high  low 
  15   17 

But put case_when in a mutate chain:

mtcars %>% 
  mutate(cg = case_when(carb <= 2 ~ "low",
                        carb > 2 ~ "high"))

and you get:

 Error: object 'carb' not found

while this works fine

mtcars %>% 
  mutate(cg = carb %>% 
           cut(c(0, 2, 8)))

R Solutions


Solution 1 - R

As of version 0.7.0 of dplyr, case_when works within mutate as follows:

library(dplyr) # >= 0.7.0
mtcars %>% 
  mutate(cg = case_when(carb <= 2 ~ "low",
                        carb > 2  ~ "high"))

For more information: http://dplyr.tidyverse.org/reference/case_when.html

Solution 2 - R

We can use .$

mtcars %>%  
     mutate(cg = case_when(.$carb <= 2 ~ "low",  .$carb > 2 ~ "high")) %>%
    .$cg %>%
    table()
# high  low 
#  15   17 

Solution 3 - R

With thanks to @sumedh: @hadley has explained that this is a known shortcoming of case_when:

> case_when() is still somewhat experiment and does not currently work > inside mutate(). That will be fixed in a future version.

Solution 4 - R

In my case, quasiquotation helped a lot. You can create in advance a set of quoted formulae that define the mutation rules (and either use known column names as in the first formula or benefit from !! and create rules dynamically as in the second formula), which is then utilized within mutate - case_when combination like here

    library(dplyr)
    library(rlang)
    pattern <- quos(gear == 3L ~ "three", !!sym("gear") == 4L ~ "four", gear == 5L ~ "five")
    # Or
    # pattern <- list(
    #     quo(gear == 3L ~ "three"), 
    #     quo(!!sym("gear") == 4L ~ "four"),
    #     quo(gear == 5L ~ "five"))
    #
    mtcars %>% mutate(test = case_when(!!!pattern)) %>% head(10L)
#>     mpg cyl  disp  hp drat    wt  qsec vs am gear carb  test
#> 1  21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4  four
#> 2  21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4  four
#> 3  22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1  four
#> 4  21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1 three
#> 5  18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2 three
#> 6  18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1 three
#> 7  14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4 three
#> 8  24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2  four
#> 9  22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2  four
#> 10 19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4  four

I prefer such solution because it allows creating complex rules, e.g. using map2 with LHS conditions and RHS values to generate quoted formulas

    library(rlang)
    library(purrr)
    map2(c(3, 4, 5), c("three", "four", "five"), ~quo(gear == !!.x ~ !!.y))
#> [[1]]
#> <quosure>
#> expr: ^gear == 3 ~ "three"
#> env:  0000000014286520
#> 
#> [[2]]
#> <quosure>
#> expr: ^gear == 4 ~ "four"
#> env:  000000001273D0E0
#> 
#> [[3]]
#> <quosure>
#> expr: ^gear == 5 ~ "five"
#> env:  00000000125870E0

and using it in different places, applying to different data sets without the need to manually type in all the rules every time you need a complex mutation.

As a final answer to the problem, 7 additional symbols and two parentheses solve it

library(rlang)
library(dplyr)
mtcars %>% 
    mutate(test = case_when(!!!quos(gear == 3L ~ "three", gear != 3L ~ "not three"))) %>% 
    head(10L)
#>     mpg cyl  disp  hp drat    wt  qsec vs am gear carb      test
#> 1  21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4 not three
#> 2  21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4 not three
#> 3  22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1 not three
#> 4  21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1     three
#> 5  18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2     three
#> 6  18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1     three
#> 7  14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4     three
#> 8  24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2 not three
#> 9  22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2 not three
#> 10 19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4 not three

Created on 2019-01-16 by the reprex package (v0.2.1.9000)

Solution 5 - R

library(dplyr) #loading the dplyr package

content150_fortified <- content150 %>% #creating a new variable
mutate(number_yn = case_when( #creating a new column using mutate
        number >= 18 & number <=25 ~ "no", # if number is "none", make number_yn "no"
        number!="none" ~ "yes"  # if number is not "none", make number_yn "yes"
        )
      )

Solution 6 - R

In addition to @akrun's answer above, be aware that the closing parenthesis for the case_when() cannot be put it onto its own line.

For example, this works OK:

mtcars %>%  
   mutate(cg = case_when(
      .$carb <= 2 ~ "low",  .$carb > 2 ~ "high")) 

but this does not:

mtcars %>%  
   mutate(cg = case_when(
      .$carb <= 2 ~ "low",  .$carb > 2 ~ "high")
      ) 

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestiontomwView Question on Stackoverflow
Solution 1 - RGeorge WoodView Answer on Stackoverflow
Solution 2 - RakrunView Answer on Stackoverflow
Solution 3 - RtomwView Answer on Stackoverflow
Solution 4 - RIliaView Answer on Stackoverflow
Solution 5 - RvarunView Answer on Stackoverflow
Solution 6 - RhackRView Answer on Stackoverflow