Subsetting R data frame results in mysterious NA rows

RSubsetReshapeNa

R Problem Overview


I've been encountering what I think is a bug. It's not a big deal, but I'm curious if anyone else has seen this. Unfortunately, my data is confidential, so I have to make up an example, and it's not going to be very helpful.

When subsetting my data, I occassionally get mysterious NA rows that aren't in my original data frame. Even the rownames are NA. EG:

example <- data.frame("var1"=c("A", "B", "A"), "var2"=c("X", "Y", "Z"))
example

  var1 var2
1    A    X
2    B    Y
3    A    Z

then I run:

example[example$var1=="A",]

  var1 var2
1    A    X
3    A    Z
NA<NA> <NA>

Of course, the example above does not actually give you this mysterious NA row; I am adding it here to illustrate the problem I'm having with my data.

Maybe it has to do with the fact that I'm importing my original data set using http://cran.r-project.org/web/packages/xlsx/xlsx.pdf">Google's read.xlsx package and then executing wide to long reshape before subsetting.

Thanks

R Solutions


Solution 1 - R

Wrap the condition in which:

df[which(df$number1 < df$number2), ]

How it works:

It returns the row numbers where the condition matches (where the condition is TRUE) and subsets the data frame on those rows accordingly.

Say that:

which(df$number1 < df$number2)

returns row numbers 1, 2, 3, 4 and 5.

As such, writing:

df[which(df$number1 < df$number2), ]

is the same as writing:

df[c(1, 2, 3, 4, 5), ]

Or an even simpler version is:

df[1:5, ]

Solution 2 - R

I see this was already answered by the OP, but since his comment is buried deep within the comment section, here's my attempt to fix this issue (at least with my data, which was behaving the same way).

First of all, some sample data:

> df <- data.frame(name = LETTERS[1:10], number1 = 1:10, number2 = c(10:3, NA, NA))
> df
   name number1 number2
1     A       1      10
2     B       2       9
3     C       3       8
4     D       4       7
5     E       5       6
6     F       6       5
7     G       7       4
8     H       8       3
9     I       9      NA
10    J      10      NA

Now for a simple filter:

> df[df$number1 < df$number2, ]
     name number1 number2
1       A       1      10
2       B       2       9
3       C       3       8
4       D       4       7
5       E       5       6
NA   <NA>      NA      NA
NA.1 <NA>      NA      NA

The problem here is that the presence of NAs in the third column causes R to rewrite the whole row as NA. Nonetheless, the data frame dimensions are maintained. Here's my fix, which requires knowledge of which column contains the NAs:

> df[df$number1 < df$number2 & !is.na(df$number2), ]
  name number1 number2
1    A       1      10
2    B       2       9
3    C       3       8
4    D       4       7
5    E       5       6

Solution 3 - R

I get the same problem when using code similar to what you posted. Using the function subset()

subset(example,example$var1=="A")

the NA row instead gets excluded.

Solution 4 - R

Using dplyr:

library(dplyr)
filter(df, number1 < number2)

Solution 5 - R

I find using %in$ instead of == can solve this issue although I am still wondering why. For example, instead of: df[df$num == 1,] use: df[df$num %in% c(1),] will work.

Solution 6 - R

   > example <- data.frame("var1"=c("A", NA, "A"), "var2"=c("X", "Y", "Z"))
    > example
      var1 var2
    1    A    X
    2 <NA>    Y
    3    A    Z
    > example[example$var1=="A",]
       var1 var2
    1     A    X
    NA <NA> <NA>
    3     A    Z

Probably this must be your result u are expecting...Try this try using which condition before condition to avoid NA's

  example[which(example$var1=="A"),]
      var1 var2
    1    A    X
    3    A    Z

Solution 7 - R

Another cause may be that you get the condition wrong, such as checking if a factor column is equal to a value that is not among its levels. Troubled me for a while.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionchrisgView Question on Stackoverflow
Solution 1 - Rc-urchinView Answer on Stackoverflow
Solution 2 - RWaldir LeoncioView Answer on Stackoverflow
Solution 3 - Ruser3612472View Answer on Stackoverflow
Solution 4 - RVictor YanView Answer on Stackoverflow
Solution 5 - RKatherine LiView Answer on Stackoverflow
Solution 6 - RJeyanthpranav SureshView Answer on Stackoverflow
Solution 7 - RJan ŠimberaView Answer on Stackoverflow