Replace all particular values in a data frame

RDataframeReplace

R Problem Overview


Having a data frame, how do I go about replacing all particular values along all rows and columns. Say for example I want to replace all empty records with NA's (without typing the positions):

df <- data.frame(list(A=c("", "xyz", "jkl"), B=c(12, "", 100)))

    A   B
1      12
2  xyz    
3  jkl 100

Expected result:

    A   B
1  NA   12
2  xyz  NA  
3  jkl  100

R Solutions


Solution 1 - R

Like this:

> df[df==""]<-NA
> df
     A    B
1 <NA>   12
2  xyz <NA>
3  jkl  100

Solution 2 - R

Since PikkuKatja and glallen asked for a more general solution and I cannot comment yet, I'll write an answer. You can combine statements as in:

> df[df=="" | df==12] <- NA
> df
     A    B
1  <NA> <NA>
2  xyz  <NA>
3  jkl  100

For factors, zxzak's code already yields factors:

> df <- data.frame(list(A=c("","xyz","jkl"), B=c(12,"",100)))
> str(df)
'data.frame':	3 obs. of  2 variables:
 $ A: Factor w/ 3 levels "","jkl","xyz": 1 3 2
 $ B: Factor w/ 3 levels "","100","12": 3 1 2

If in trouble, I'd suggest to temporarily drop the factors.

df[] <- lapply(df, as.character)

Solution 3 - R

Here are a couple dplyr options:

library(dplyr)

# all columns:
df %>% 
  mutate_all(~na_if(., ''))

# specific column types:
df %>% 
  mutate_if(is.factor, ~na_if(., ''))

# specific columns:  
df %>% 
  mutate_at(vars(A, B), ~na_if(., ''))

# or:
df %>% 
  mutate(A = replace(A, A == '', NA))

# replace can be used if you want something other than NA:
df %>% 
  mutate(A = as.character(A)) %>% 
  mutate(A = replace(A, A == '', 'used to be empty'))

Solution 4 - R

We can use data.table to get it quickly. First create df without factors,

df <- data.frame(list(A=c("","xyz","jkl"), B=c(12,"",100)), stringsAsFactors=F)

Now you can use

setDT(df)
for (jj in 1:ncol(df)) set(df, i = which(df[[jj]]==""), j = jj, v = NA)

and you can convert it back to a data.frame

setDF(df)

If you only want to use data.frame and keep factors it's more difficult, you need to work with

levels(df$value)[levels(df$value)==""] <- NA

where value is the name of every column. You need to insert it in a loop.

Solution 5 - R

If you want to replace multiple values in a data frame, looping through all columns might help.

Say you want to replace "" and 100:

na_codes <- c(100, "")
for (i in seq_along(df)) {
    df[[i]][df[[i]] %in% na_codes] <- NA
}

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionzxzakView Question on Stackoverflow
Solution 1 - RmripView Answer on Stackoverflow
Solution 2 - RsedotView Answer on Stackoverflow
Solution 3 - RsbhaView Answer on Stackoverflow
Solution 4 - RskanView Answer on Stackoverflow
Solution 5 - ROlivier MaView Answer on Stackoverflow