Change the class from factor to numeric of many columns in a data frame

R

R Problem Overview


What is the quickest/best way to change a large number of columns to numeric from factor?

I used the following code but it appears to have re-ordered my data.

> head(stats[,1:2])
  rk                 team
1  1 Washington Capitals*
2  2     San Jose Sharks*
3  3  Chicago Blackhawks*
4  4     Phoenix Coyotes*
5  5   New Jersey Devils*
6  6   Vancouver Canucks*

for(i in c(1,3:ncol(stats))) {
	stats[,i] <- as.numeric(stats[,i])
}

> head(stats[,1:2])
  rk                 team
1  2 Washington Capitals*
2 13     San Jose Sharks*
3 24  Chicago Blackhawks*
4 26     Phoenix Coyotes*
5 27   New Jersey Devils*
6 28   Vancouver Canucks*

What is the best way, short of naming every column as in:

df$colname <- as.numeric(ds$colname)

R Solutions


Solution 1 - R

You have to be careful while changing factors to numeric. Here is a line of code that would change a set of columns from factor to numeric. I am assuming here that the columns to be changed to numeric are 1, 3, 4 and 5 respectively. You could change it accordingly

cols = c(1, 3, 4, 5);    
df[,cols] = apply(df[,cols], 2, function(x) as.numeric(as.character(x)));

Solution 2 - R

Further to Ramnath's answer, the behaviour you are experiencing is that due to as.numeric(x) returning the internal, numeric representation of the factor x at the R level. If you want to preserve the numbers that are the levels of the factor (rather than their internal representation), you need to convert to character via as.character() first as per Ramnath's example.

Your for loop is just as reasonable as an apply call and might be slightly more readable as to what the intention of the code is. Just change this line:

stats[,i] <- as.numeric(stats[,i])

to read

stats[,i] <- as.numeric(as.character(stats[,i]))

This is FAQ 7.10 in the R FAQ.

HTH

Solution 3 - R

This can be done in one line, there's no need for a loop, be it a for-loop or an apply. Use unlist() instead :

# testdata
Df <- data.frame(
  x = as.factor(sample(1:5,30,r=TRUE)),
  y = as.factor(sample(1:5,30,r=TRUE)),
  z = as.factor(sample(1:5,30,r=TRUE)),
  w = as.factor(sample(1:5,30,r=TRUE))
)
##

Df[,c("y","w")] <- as.numeric(as.character(unlist(Df[,c("y","w")])))

str(Df)

Edit : for your code, this becomes :

id <- c(1,3:ncol(stats))) 
stats[,id] <- as.numeric(as.character(unlist(stats[,id])))

Obviously, if you have a one-column data frame and you don't want the automatic dimension reduction of R to convert it to a vector, you'll have to add the drop=FALSE argument.

Solution 4 - R

I know this question is long resolved, but I recently had a similar issue and think I've found a little more elegant and functional solution, although it requires the magrittr package.

library(magrittr)
cols = c(1, 3, 4, 5)
df[,cols] %<>% lapply(function(x) as.numeric(as.character(x)))

The %<>% operator pipes and reassigns, which is very useful for keeping data cleaning and transformation simple. Now the list apply function is much easier to read, by only specifying the function you wish to apply.

Solution 5 - R

Here are some dplyr options:

# by column type:
df %>% 
  mutate_if(is.factor, ~as.numeric(as.character(.)))

# by specific columns:
df %>% 
  mutate_at(vars(x, y, z), ~as.numeric(as.character(.))) 

# all columns:
df %>% 
  mutate_all(~as.numeric(as.character(.))) 

Solution 6 - R

I think that [ucfagls found why](https://stackoverflow.com/questions/3796266/change-the-class-of-many-columns-in-a-data-frame/3797343#3797343 "ucfagls anwser") your loop is not working.

In case you still don't want use a loop here is solution with lapply:

factorToNumeric <- function(f) as.numeric(levels(f))[as.integer(f)] 
cols <- c(1, 3:ncol(stats))
stats[cols] <- lapply(stats[cols], factorToNumeric)

Edit. I found simpler solution. It seems that as.matrix convert to character. So

stats[cols] <- as.numeric(as.matrix(stats[cols]))

should do what you want.

Solution 7 - R

lapply is pretty much designed for this

unfactorize<-c("colA","colB")
df[,unfactorize]<-lapply(unfactorize, function(x) as.numeric(as.character(df[,x])))

Solution 8 - R

I found this function on a couple other duplicate threads and have found it an elegant and general way to solve this problem. This thread shows up first on most searches on this topic, so I am sharing it here to save folks some time. I take no credit for this just so see the original posts here and here for details.

df <- data.frame(x = 1:10,
                 y = rep(1:2, 5),
                 k = rnorm(10, 5,2),
                 z = rep(c(2010, 2012, 2011, 2010, 1999), 2),
                 j = c(rep(c("a", "b", "c"), 3), "d"))

convert.magic <- function(obj, type){
  FUN1 <- switch(type,
                 character = as.character,
                 numeric = as.numeric,
                 factor = as.factor)
  out <- lapply(obj, FUN1)
  as.data.frame(out)
}

str(df)
str(convert.magic(df, "character"))
str(convert.magic(df, "factor"))
df[, c("x", "y")] <- convert.magic(df[, c("x", "y")], "factor")

Solution 9 - R

I would like to point out that if you have NAs in any column, simply using subscripts will not work. If there are NAs in the factor, you must use the apply script provided by Ramnath.

E.g.

Df <- data.frame(
  x = c(NA,as.factor(sample(1:5,30,r=T))),
  y = c(NA,as.factor(sample(1:5,30,r=T))),
  z = c(NA,as.factor(sample(1:5,30,r=T))),
  w = c(NA,as.factor(sample(1:5,30,r=T)))
)

Df[,c(1:4)] <- as.numeric(as.character(Df[,c(1:4)]))

Returns the following:

Warning message:
NAs introduced by coercion 

    > head(Df)
       x  y  z  w
    1 NA NA NA NA
    2 NA NA NA NA
    3 NA NA NA NA
    4 NA NA NA NA
    5 NA NA NA NA
    6 NA NA NA NA

But:

Df[,c(1:4)]= apply(Df[,c(1:4)], 2, function(x) as.numeric(as.character(x)))

Returns:

> head(Df)
   x  y  z  w
1 NA NA NA NA
2  2  3  4  1
3  1  5  3  4
4  2  3  4  1
5  5  3  5  5
6  4  2  4  4

Solution 10 - R

you can use unfactor() function from "varhandle" package form CRAN:

library("varhandle")

my_iris <- data.frame(Sepal.Length = factor(iris$Sepal.Length),
                      sample_id = factor(1:nrow(iris)))

my_iris <- unfactor(my_iris)

Solution 11 - R

I like this code because it's pretty handy:

  data[] <- lapply(data, function(x) type.convert(as.character(x), as.is = TRUE)) #change all vars to their best fitting data type

It is not exactly what was asked for (convert to numeric), but in many cases even more appropriate.

Solution 12 - R

I tried a bunch of these on a similar problem and kept getting NAs. Base R has some really irritating coercion behaviors, which are generally fixed in Tidyverse packages. I used to avoid them because I didn't want to create dependencies, but they make life so much easier that now I don't even bother trying to figure out the Base R solution most of the time.

Here's the Tidyverse solution, which is extremely simple and elegant:

library(purrr)

mydf <- data.frame(
  x1 = factor(c(3, 5, 4, 2, 1)),
  x2 = factor(c("A", "C", "B", "D", "E")),
  x3 = c(10, 8, 6, 4, 2))

map_df(mydf, as.numeric)

Solution 13 - R

df$colname <- as.numeric(df$colname)

I tried this way for changing one column type and I think it is better than many other versions, if you are not going to change all column types

df$colname <- as.character(df$colname)

for the vice versa.

Solution 14 - R

I had problems converting all columns to numeric with an apply() call:

apply(data, 2, as.numeric)

The problem turns out to be because some of the strings had a comma in them -- e.g. "1,024.63" instead of "1024.63" -- and R does not like this way of formatting numbers. So I removed them and then ran as.numeric():

data = as.data.frame(apply(data, 2, function(x) {
  y = str_replace_all(x, ",", "") #remove commas
  return(as.numeric(y)) #then convert
}))

Note that this requires the stringr package to be loaded.

Solution 15 - R

That's what's worked for me. The apply() function tries to coerce df to matrix and it returns NA's.

numeric.df <- as.data.frame(sapply(df, 2, as.numeric))

Solution 16 - R

Based on @SDahm's answer, this was an "optimal" solution for my tibble:

data %<>% lapply(type.convert) %>% as.data.table()

This requires dplyr and magrittr.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionBtibert3View Question on Stackoverflow
Solution 1 - RRamnathView Answer on Stackoverflow
Solution 2 - RGavin SimpsonView Answer on Stackoverflow
Solution 3 - RJoris MeysView Answer on Stackoverflow
Solution 4 - RDanView Answer on Stackoverflow
Solution 5 - RsbhaView Answer on Stackoverflow
Solution 6 - RMarekView Answer on Stackoverflow
Solution 7 - RtranscomView Answer on Stackoverflow
Solution 8 - RElectioneerView Answer on Stackoverflow
Solution 9 - RElizabethView Answer on Stackoverflow
Solution 10 - RMehrad MahmoudianView Answer on Stackoverflow
Solution 11 - RSDahmView Answer on Stackoverflow
Solution 12 - RAaron CooleyView Answer on Stackoverflow
Solution 13 - Rhuseyn rahimovView Answer on Stackoverflow
Solution 14 - RCoderGuy123View Answer on Stackoverflow
Solution 15 - RAlina ShabatovView Answer on Stackoverflow
Solution 16 - RJames HirschornView Answer on Stackoverflow