Omit rows containing specific column of NA

RDataframeNa

R Problem Overview


I want to know how to omit NA values in a data frame, but only in some columns I am interested in.

For example,

DF <- data.frame(x = c(1, 2, 3), y = c(0, 10, NA), z=c(NA, 33, 22))

but I only want to omit the data where y is NA, therefore the result should be

  x  y  z
1 1  0 NA
2 2 10 33

na.omit seems delete all rows contain any NA.

Can somebody help me out of this simple question?

But if now I change the question like:

DF <- data.frame(x = c(1, 2, 3,NA), y = c(1,0, 10, NA), z=c(43,NA, 33, NA))

If I want to omit only x=na or z=na, where can I put the | in function?

R Solutions


Solution 1 - R

Use is.na

DF <- data.frame(x = c(1, 2, 3), y = c(0, 10, NA), z=c(NA, 33, 22))
DF[!is.na(DF$y),]

Solution 2 - R

Hadley's tidyr just got this amazing function drop_na

library(tidyr)
DF %>% drop_na(y)
  x  y  z
1 1  0 NA
2 2 10 33

Solution 3 - R

You could use the complete.cases function and put it into a function thusly:

DF <- data.frame(x = c(1, 2, 3), y = c(0, 10, NA), z=c(NA, 33, 22))

completeFun <- function(data, desiredCols) {
  completeVec <- complete.cases(data[, desiredCols])
  return(data[completeVec, ])
}

completeFun(DF, "y")
#   x  y  z
# 1 1  0 NA
# 2 2 10 33

completeFun(DF, c("y", "z"))
#   x  y  z
# 2 2 10 33

EDIT: Only return rows with no NAs

If you want to eliminate all rows with at least one NA in any column, just use the complete.cases function straight up:

DF[complete.cases(DF), ]
#   x  y  z
# 2 2 10 33

Or if completeFun is already ingrained in your workflow ;)

completeFun(DF, names(DF))

Solution 4 - R

Use 'subset'

DF <- data.frame(x = c(1, 2, 3), y = c(0, 10, NA), z=c(NA, 33, 22))
subset(DF, !is.na(y))

Solution 5 - R

It is possible to use na.omit for data.table:

na.omit(data, cols = c("x", "z"))

Solution 6 - R

Omit row if either of two specific columns contain <NA>.

DF[!is.na(DF$x)&!is.na(DF$z),]

Solution 7 - R

Try this:

cc=is.na(DF$y)
m=which(cc==c("TRUE"))
DF=DF[-m,]

Solution 8 - R

To update, a tidyverse approach with dplyr:

library(dplyr)

your_data_frame %>% 
  filter(!is.na(region_column))

Solution 9 - R

Just try this:

DF %>% t %>% na.omit %>% t

It transposes the data frame and omits null rows which were 'columns' before transposition and then you transpose it back.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
Questionuser1489975View Question on Stackoverflow
Solution 1 - RmnelView Answer on Stackoverflow
Solution 2 - RamrrsView Answer on Stackoverflow
Solution 3 - RBenBarnesView Answer on Stackoverflow
Solution 4 - RRnoobView Answer on Stackoverflow
Solution 5 - RDroneyView Answer on Stackoverflow
Solution 6 - RM.VikingView Answer on Stackoverflow
Solution 7 - RrockswapView Answer on Stackoverflow
Solution 8 - RVinícius FélixView Answer on Stackoverflow
Solution 9 - RLuchao QiView Answer on Stackoverflow