How does one change the levels of a factor column in a data.table

Rdata.table

R Problem Overview


What is the correct way to change the levels of a factor column in a data.table (note: not data frame)

  library(data.table)
  mydt <- data.table(id=1:6, value=as.factor(c("A", "A", "B", "B", "B", "C")), key="id")

  mydt[, levels(value)]
  [1] "A" "B" "C"

I am looking for something like:

mydt[, levels(value) <- c("X", "Y", "Z")]

But of course, the above line does not work.

    # Actual               # Expected result
    > mydt                  > mydt
       id value                id value
    1:  1     A             1:  1     X
    2:  2     A             2:  2     X
    3:  3     B             3:  3     Y
    4:  4     B             4:  4     Y
    5:  5     B             5:  5     Y
    6:  6     C             6:  6     Z

R Solutions


Solution 1 - R

You can still set them the traditional way:

levels(mydt$value) <- c(...)

This should be plenty fast unless mydt is very large since that traditional syntax copies the entire object. You could also play the un-factoring and refactoring game... but no one likes that game anyway.

To change the levels by reference with no copy of mydt :

setattr(mydt$value,"levels",c(...))

but be sure to assign a valid levels vector (type character of sufficient length) otherwise you'll end up with an invalid factor (levels<- does some checking as well as copying).

Solution 2 - R

I would rather go the traditional way of re-assignment to the factors

> mydt$value # This we what we had originally
[1] A A B B B C
Levels: A B C
> levels(mydt$value) # just checking the levels
[1] "A" "B" "C"
**# Meat of the re-assignment**
> levels(mydt$value)[levels(mydt$value)=="A"] <- "X"
> levels(mydt$value)[levels(mydt$value)=="B"] <- "Y"
> levels(mydt$value)[levels(mydt$value)=="C"] <- "Z"
> levels(mydt$value)
[1] "X" "Y" "Z"
> mydt # This is what we wanted
   id value
1:  1     X
2:  2     X
3:  3     Y
4:  4     Y
5:  5     Y
6:  6     Z

As you probably notices, the meat of the re-assignment is very intuitive, it checks for the exact level(use grepl in case there's a fuzzy math, regular expressions or likewise)

levels(mydt$value)[levels(mydt$value)=="A"] <- "X" This explicitly checks the value in the levels of the variable under consideration and then reassigns X (and so on) to it - The advantage- you explicitly KNOW what labeled what.

I find renaming levels as here levels(mydt$value) <- c("X","Y","Z") very non-intuitive, since it just assigns X to the 1st level it SEES in the data (so the order really matters)

PPS : In case of too many levels, use looping constructs.

Solution 3 - R

You can also rename and add to your levels using a related approach, which can be very handy, especially if you are making a plot that needs more informative labels in a particular order (as opposed to the default):

f <- factor(c("a","b"))
levels(f) <- list(C = "C", D = "a", B = "b")

(modified from ?levels)

Solution 4 - R

This is safer than Matt Dowle's suggestion (because it uses the checks skipped by setattr) but won't copy the entire data.table. It will replace the entire column vector (whereas Matt's solution only replaces the attributes of the column vector) , but that seems like an acceptable trade-off in order to reduce the risk of messing up the factor object.

mydt[, value:=`levels<-`(value, c("X", "Y", "Z"))]

Solution 5 - R

Simplest way to change a column's levels:

dat$colname <- as.factor(as.vector(dat$colname));

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionRicardo SaportaView Question on Stackoverflow
Solution 1 - RJustinView Answer on Stackoverflow
Solution 2 - RektaView Answer on Stackoverflow
Solution 3 - RBryan HansonView Answer on Stackoverflow
Solution 4 - RMichaelView Answer on Stackoverflow
Solution 5 - RAsher SchachterView Answer on Stackoverflow