Remove NA values from a vector

R

R Problem Overview


I have a huge vector which has a couple of NA values, and I'm trying to find the max value in that vector (the vector is all numbers), but I can't do this because of the NA values.

How can I remove the NA values so that I can compute the max?

R Solutions


Solution 1 - R

Trying ?max, you'll see that it actually has a na.rm = argument, set by default to FALSE. (That's the common default for many other R functions, including sum(), mean(), etc.)

Setting na.rm=TRUE does just what you're asking for:

d <- c(1, 100, NA, 10)
max(d, na.rm=TRUE)

If you do want to remove all of the NAs, use this idiom instead:

d <- d[!is.na(d)]

A final note: Other functions (e.g. table(), lm(), and sort()) have NA-related arguments that use different names (and offer different options). So if NA's cause you problems in a function call, it's worth checking for a built-in solution among the function's arguments. I've found there's usually one already there.

Solution 2 - R

The na.omit function is what a lot of the regression routines use internally:

vec <- 1:1000
vec[runif(200, 1, 1000)] <- NA
max(vec)
#[1] NA
max( na.omit(vec) )
#[1] 1000

Solution 3 - R

Use discard from purrr (works with lists and vectors).

discard(v, is.na) 

The benefit is that it is easy to use pipes; alternatively use the built-in subsetting function [:

v %>% discard(is.na)
v %>% `[`(!is.na(.))

Note that na.omit does not work on lists:

> x <- list(a=1, b=2, c=NA)
> na.omit(x)
$a
[1] 1

$b
[1] 2

$c
[1] NA

Solution 4 - R

?max shows you that there is an extra parameter na.rm that you can set to TRUE.

Apart from that, if you really want to remove the NAs, just use something like:

myvec[!is.na(myvec)]

Solution 5 - R

Just in case someone new to R wants a simplified answer to the original question

> How can I remove NA values from a vector?

Here it is:

Assume you have a vector foo as follows:

foo = c(1:10, NA, 20:30)

running length(foo) gives 22.

nona_foo = foo[!is.na(foo)]

length(nona_foo) is 21, because the NA values have been removed.

Remember is.na(foo) returns a boolean matrix, so indexing foo with the opposite of this value will give you all the elements which are not NA.

Solution 6 - R

You can call max(vector, na.rm = TRUE). More generally, you can use the na.omit() function.

Solution 7 - R

I ran a quick benchmark comparing the two base approaches and it turns out that x[!is.na(x)] is faster than na.omit. User qwr suggested I try purrr::dicard also - this turned out to be massively slower (though I'll happily take comments on my implementation & test!)

microbenchmark::microbenchmark(
  purrr::map(airquality,function(x) {x[!is.na(x)]}), 
  purrr::map(airquality,na.omit),
  purrr::map(airquality, ~purrr::discard(.x, .p = is.na)),
  times = 1e6)

Unit: microseconds
                                                     expr    min     lq      mean median      uq       max neval cld
 purrr::map(airquality, function(x) {     x[!is.na(x)] })   66.8   75.9  130.5643   86.2  131.80  541125.5 1e+06 a  
                          purrr::map(airquality, na.omit)   95.7  107.4  185.5108  129.3  190.50  534795.5 1e+06  b 
  purrr::map(airquality, ~purrr::discard(.x, .p = is.na)) 3391.7 3648.6 5615.8965 4079.7 6486.45 1121975.4 1e+06   c

For reference, here's the original test of x[!is.na(x)] vs na.omit:

microbenchmark::microbenchmark(
    purrr::map(airquality,function(x) {x[!is.na(x)]}), 
    purrr::map(airquality,na.omit), 
    times = 1000000)


Unit: microseconds
                                              expr  min   lq      mean median    uq      max neval cld
 map(airquality, function(x) {     x[!is.na(x)] }) 53.0 56.6  86.48231   58.1  64.8 414195.2 1e+06  a 
                          map(airquality, na.omit) 85.3 90.4 134.49964   92.5 104.9 348352.8 1e+06   b

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionCodeGuyView Question on Stackoverflow
Solution 1 - RJosh O'BrienView Answer on Stackoverflow
Solution 2 - RIRTFMView Answer on Stackoverflow
Solution 3 - RqwrView Answer on Stackoverflow
Solution 4 - RNick SabbeView Answer on Stackoverflow
Solution 5 - RScott C WilsonView Answer on Stackoverflow
Solution 6 - RMichael HoffmanView Answer on Stackoverflow
Solution 7 - RjsavnView Answer on Stackoverflow