How to replace NaN value with zero in a huge data frame?

RReplace

R Problem Overview


I tried to replace NaN values with zeros using the following script:

rapply( data123, f=function(x) ifelse(is.nan(x),0,x), how="replace" )
# [31]   0.00000000  -0.67994832   0.50287454   0.63979527   1.48410571  -2.90402836

The NaN value was showing to be zero but when I typed in the name of the data frame and tried to review it, the value was still remaining NaN.

data123$contri_us
# [31]          NaN  -0.67994832   0.50287454   0.63979527   1.48410571  -2.90402836

I am not sure whether the rapply command was actually applying the adjustment in the data frame, or just replaced the value as per shown.

Any idea how to actually change the NaN value to zero?

R Solutions


Solution 1 - R

It would seem that is.nan doesn't actually have a method for data frames, unlike is.na. So, let's fix that!

is.nan.data.frame <- function(x)
do.call(cbind, lapply(x, is.nan))

data123[is.nan(data123)] <- 0

Solution 2 - R

In fact, in R, this operation is very easy:

If the matrix 'a' contains some NaN, you just need to use the following code to replace it by 0:

a <- matrix(c(1, NaN, 2, NaN), ncol=2, nrow=2)
a[is.nan(a)] <- 0
a

If the data frame 'b' contains some NaN, you just need to use the following code to replace it by 0:

#for a data.frame: 
b <- data.frame(c1=c(1, NaN, 2), c2=c(NaN, 2, 7))
b[is.na(b)] <- 0
b

Note the difference is.nan when it's a matrix vs. is.na when it's a data frame.

Doing

#...
b[is.nan(b)] <- 0
#...

yields: Error in is.nan(b) : default method not implemented for type 'list' because b is a data frame.

Note: Edited for small but confusing typos

Solution 3 - R

The following should do what you want:

x <- data.frame(X1=sample(c(1:3,NaN), 200, replace=TRUE), X2=sample(c(4:6,NaN), 200, replace=TRUE))
head(x)
x <- replace(x, is.na(x), 0)
head(x)

Solution 4 - R

Here is a tidyverse solution. I've generated sample data with both NaN and NA. The first column is fully complete.

df <- tibble(x = LETTERS[1:5],
             y = c(1:3, NaN, 4),
             z = c(rep(NaN, 3), NA, 5))

> df
# A tibble: 5 x 3
  x         y     z
  <chr> <dbl> <dbl>
1 A         1   NaN
2 B         2   NaN
3 C         3   NaN
4 D       NaN    NA
5 E         4     5

Then we can apply mutate_all with replace to the dataframe:

> df %>% 
+   mutate_all(~replace(., is.nan(.), 0))
# A tibble: 5 x 3
  x         y     z
  <chr> <dbl> <dbl>
1 A         1     0
2 B         2     0
3 C         3     0
4 D         0    NA 
5 E         4     5

We've replaced NaN values with zero and touched neither NA values nor the x column.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestioncactussssView Question on Stackoverflow
Solution 1 - RHong OoiView Answer on Stackoverflow
Solution 2 - RleDjegView Answer on Stackoverflow
Solution 3 - RMarc in the boxView Answer on Stackoverflow
Solution 4 - RatsyplenkovView Answer on Stackoverflow