Correct syntax for mutate_if

RDplyrNa

R Problem Overview


I would like to replace NA values with zeros via mutate_if in dplyr. The syntax below:

set.seed(1)
mtcars[sample(1:dim(mtcars)[1], 5),
       sample(1:dim(mtcars)[2], 5)] <-  NA

require(dplyr)

mtcars %>% 
    mutate_if(is.na,0)

mtcars %>% 
    mutate_if(is.na, funs(. = 0))

Returns error:

> Error in vapply(tbl, p, logical(1), ...) : values must be length 1, > but FUN(X[[1]]) result is length 32

What's the correct syntax for this operation?

R Solutions


Solution 1 - R

The "if" in mutate_if refers to choosing columns, not rows. Eg mutate_if(data, is.numeric, ...) means to carry out a transformation on all numeric columns in your dataset.

If you want to replace all NAs with zeros in numeric columns:

data %>% mutate_if(is.numeric, funs(ifelse(is.na(.), 0, .)))

Solution 2 - R

I learned this trick from the purrr tutorial, and it also works in dplyr. There are two ways to solve this problem:
First, define custom functions outside the pipe, and use it in mutate_if():

any_column_NA <- function(x){
    any(is.na(x))
}
replace_NA_0 <- function(x){
    if_else(is.na(x),0,x)
}
mtcars %>% mutate_if(any_column_NA,replace_NA_0)

Second, use the combination of ~,. or .x.( .x can be replaced with ., but not any other character or symbol):

mtcars %>% mutate_if(~ any(is.na(.x)),~ if_else(is.na(.x),0,.x))
#This also works
mtcars %>% mutate_if(~ any(is.na(.)),~ if_else(is.na(.),0,.))

In your case, you can also use mutate_all():

mtcars %>% mutate_all(~ if_else(is.na(.x),0,.x))

Using ~, we can define an anonymous function, while .x or . stands for the variable. In mutate_if() case, . or .x is each column.

Solution 3 - R

mtcars %>% mutate_if(is.numeric, replace_na, 0)

or more recent syntax

mtcars %>% mutate(across(where(is.numeric),
                         replace_na, 0))

Solution 4 - R

We can use set from data.table

library(data.table)
setDT(mtcars)
for(j in seq_along(mtcars)){
  set(mtcars, i= which(is.na(mtcars[[j]])), j = j, value = 0)
 }

Solution 5 - R

I always struggle with replace_na function of dplyr

  replace(is.na(.),0)

this works for me for what you are trying to do.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionKonradView Question on Stackoverflow
Solution 1 - RHong OoiView Answer on Stackoverflow
Solution 2 - RyusuzechView Answer on Stackoverflow
Solution 3 - RNettleView Answer on Stackoverflow
Solution 4 - RakrunView Answer on Stackoverflow
Solution 5 - Rok1moreView Answer on Stackoverflow