Warning message: In `...` : invalid factor level, NA generated

RWarningsR Faq

R Problem Overview


I don't understand why I got this warning message.

> fixed <- data.frame("Type" = character(3), "Amount" = numeric(3))
> fixed[1, ] <- c("lunch", 100)
Warning message:
In `[<-.factor`(`*tmp*`, iseq, value = "lunch") :
  invalid factor level, NA generated
> fixed
  Type Amount
1 <NA>    100
2           0
3           0

R Solutions


Solution 1 - R

The warning message is because your "Type" variable was made a factor and "lunch" was not a defined level. Use the stringsAsFactors = FALSE flag when making your data frame to force "Type" to be a character.

> fixed <- data.frame("Type" = character(3), "Amount" = numeric(3))
> str(fixed)
'data.frame':	3 obs. of  2 variables:
 $ Type  : Factor w/ 1 level "": NA 1 1
 $ Amount: chr  "100" "0" "0"
> 
> fixed <- data.frame("Type" = character(3), "Amount" = numeric(3),stringsAsFactors=FALSE)
> fixed[1, ] <- c("lunch", 100)
> str(fixed)
'data.frame':	3 obs. of  2 variables:
 $ Type  : chr  "lunch" "" ""
 $ Amount: chr  "100" "0" "0"

Solution 2 - R

If you are reading directly from CSV file then do like this.

myDataFrame <- read.csv("path/to/file.csv", header = TRUE, stringsAsFactors = FALSE)

Solution 3 - R

Here is a flexible approach, it can be used in all cases, in particular:

  1. to affect only one column, or
  2. the dataframe has been obtained from applying previous operations (e.g. not immediately opening a file, or creating a new data frame).

First, un-factorize a string using the as.character function, and, then, re-factorize with the as.factor (or simply factor) function:

fixed <- data.frame("Type" = character(3), "Amount" = numeric(3))

# Un-factorize (as.numeric can be use for numeric values)
#              (as.vector  can be use for objects - not tested)
fixed$Type <- as.character(fixed$Type)
fixed[1, ] <- c("lunch", 100)

# Re-factorize with the as.factor function or simple factor(fixed$Type)
fixed$Type <- as.factor(fixed$Type)

Solution 4 - R

The easiest way to fix this is to add a new factor to your column. Use the levels function to determine how many factors you have and then add a new factor.

    > levels(data$Fireplace.Qu)
    [1] "Ex" "Fa" "Gd" "Po" "TA"
    > levels(data$Fireplace.Qu) = c("Ex", "Fa", "Gd", "Po", "TA", "None")
    [1] "Ex"   "Fa"   "Gd"   "Po"   " TA"  "None"
    

Solution 5 - R

I have got similar issue which data retrieved from .xlsx file. Unfortunately, I could not find the proper answer here. I handled it on my own with dplyr as below which might help others:

#install.packages("xlsx")
library(xlsx)
extracted_df <- read.xlsx("test.xlsx", sheetName='Sheet1', stringsAsFactors=FALSE)
# Replace all NAs in a data frame with "G" character
extracted_df[is.na(extracted_df)] <- "G"

However, I could not handle it with the readxl package which does not have similar parameter to the stringsAsFactors. For the reason, I have moved to the xlsx package.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionihmView Question on Stackoverflow
Solution 1 - RDavidView Answer on Stackoverflow
Solution 2 - RChiragView Answer on Stackoverflow
Solution 3 - Rtoto_ticoView Answer on Stackoverflow
Solution 4 - REddie MillerView Answer on Stackoverflow
Solution 5 - RozturkibView Answer on Stackoverflow