Removing NA observations with dplyr::filter()
RDplyrR Problem Overview
My data looks like this:
library(tidyverse)
df <- tribble(
~a, ~b, ~c,
1, 2, 3,
1, NA, 3,
NA, 2, 3
)
I can remove all NA
observations with drop_na()
:
df %>% drop_na()
Or remove all NA
observations in a single column (a
for example):
df %>% drop_na(a)
Why can't I just use a regular !=
filter pipe?
df %>% filter(a != NA)
Why do we have to use a special function from tidyr to remove NAs?
R Solutions
Solution 1 - R
For example:
you can use:
df %>% filter(!is.na(a))
to remove the NA in column a.
Solution 2 - R
If someone is here in 2020, after making all the pipes, if u pipe %>% na.exclude
will take away all the NAs in the pipe!
Solution 3 - R
From @Ben Bolker:
> [T]his has nothing specifically to do with dplyr::filter()
From @Marat Talipov:
> [A]ny comparison with NA, including NA==NA, will return NA
From a related answer by @farnsy:
> The == operator does not treat NA's as you would expect it to. > > Think of NA as meaning "I don't know what's there". The correct answer > to 3 > NA is obviously NA because we don't know if the missing value > is larger than 3 or not. Well, it's the same for NA == NA. They are > both missing values but the true values could be quite different, so > the correct answer is "I don't know." > > R doesn't know what you are doing in your analysis, so instead of > potentially introducing bugs that would later end up being published > an embarrassing you, it doesn't allow comparison operators to think NA > is a value.
Solution 4 - R
I always use this and it is working perfectly
cool$day[cool$day==''] <- NA
cool$day[is.na(cool$day)] <- "NA"
cool <- cool[!cool$day == "NA", ]