data.frame rows to a list

ListRDataframe

List Problem Overview


I have a data.frame which I would like to convert to a list by rows, meaning each row would correspond to its own list elements. In other words, I would like a list that is as long as the data.frame has rows.

So far, I've tackled this problem in the following manner, but I was wondering if there's a better way to approach this.

xy.df <- data.frame(x = runif(10),  y = runif(10))

# pre-allocate a list and fill it with a loop
xy.list <- vector("list", nrow(xy.df))
for (i in 1:nrow(xy.df)) {
    xy.list[[i]] <- xy.df[i,]
}

List Solutions


Solution 1 - List

Like this:

xy.list <- split(xy.df, seq(nrow(xy.df)))

And if you want the rownames of xy.df to be the names of the output list, you can do:

xy.list <- setNames(split(xy.df, seq(nrow(xy.df))), rownames(xy.df))

Solution 2 - List

Eureka!

xy.list <- as.list(as.data.frame(t(xy.df)))

Solution 3 - List

If you want to completely abuse the data.frame (as I do) and like to keep the $ functionality, one way is to split you data.frame into one-line data.frames gathered in a list :

> df = data.frame(x=c('a','b','c'), y=3:1)
> df
  x y
1 a 3
2 b 2
3 c 1

# 'convert' into a list of data.frames
ldf = lapply(as.list(1:dim(df)[1]), function(x) df[x[1],])

> ldf
[[1]]
x y
1 a 3    
[[2]]
x y
2 b 2
[[3]]
x y
3 c 1

# and the 'coolest'
> ldf[[2]]$y
[1] 2

It is not only intellectual masturbation, but allows to 'transform' the data.frame into a list of its lines, keeping the $ indexation which can be useful for further use with lapply (assuming the function you pass to lapply uses this $ indexation)

Solution 4 - List

A more modern solution uses only purrr::transpose:

library(purrr)
iris[1:2,] %>% purrr::transpose()
#> [[1]]
#> [[1]]$Sepal.Length
#> [1] 5.1
#> 
#> [[1]]$Sepal.Width
#> [1] 3.5
#> 
#> [[1]]$Petal.Length
#> [1] 1.4
#> 
#> [[1]]$Petal.Width
#> [1] 0.2
#> 
#> [[1]]$Species
#> [1] 1
#> 
#> 
#> [[2]]
#> [[2]]$Sepal.Length
#> [1] 4.9
#> 
#> [[2]]$Sepal.Width
#> [1] 3
#> 
#> [[2]]$Petal.Length
#> [1] 1.4
#> 
#> [[2]]$Petal.Width
#> [1] 0.2
#> 
#> [[2]]$Species
#> [1] 1

Solution 5 - List

A couple of more options :

With asplit

asplit(xy.df, 1)
#[[1]]
#     x      y 
#0.1137 0.6936 

#[[2]]
#     x      y 
#0.6223 0.5450 

#[[3]]
#     x      y 
#0.6093 0.2827 
#....

With split and row

split(xy.df, row(xy.df)[, 1])

#$`1`
#       x      y
#1 0.1137 0.6936

#$`2`
#       x     y
#2 0.6223 0.545

#$`3`
#       x      y
#3 0.6093 0.2827
#....

data

set.seed(1234)
xy.df <- data.frame(x = runif(10),  y = runif(10))

Solution 6 - List

I was working on this today for a data.frame (really a data.table) with millions of observations and 35 columns. My goal was to return a list of data.frames (data.tables) each with a single row. That is, I wanted to split each row into a separate data.frame and store these in a list.

Here are two methods I came up with that were roughly 3 times faster than split(dat, seq_len(nrow(dat))) for that data set. Below, I benchmark the three methods on a 7500 row, 5 column data set (iris repeated 50 times).

library(data.table)
library(microbenchmark)

microbenchmark(
split={dat1 <- split(dat, seq_len(nrow(dat)))},
setDF={dat2 <- lapply(seq_len(nrow(dat)),
                  function(i) setDF(lapply(dat, "[", i)))},
attrDT={dat3 <- lapply(seq_len(nrow(dat)),
           function(i) {
             tmp <- lapply(dat, "[", i)
             attr(tmp, "class") <- c("data.table", "data.frame")
             setDF(tmp)
           })},
datList = {datL <- lapply(seq_len(nrow(dat)),
                          function(i) lapply(dat, "[", i))},
times=20
) 

This returns

Unit: milliseconds
       expr      min       lq     mean   median        uq       max neval
      split 861.8126 889.1849 973.5294 943.2288 1041.7206 1250.6150    20
      setDF 459.0577 466.3432 511.2656 482.1943  500.6958  750.6635    20
     attrDT 399.1999 409.6316 461.6454 422.5436  490.5620  717.6355    20
    datList 192.1175 201.9896 241.4726 208.4535  246.4299  411.2097    20

While the differences are not as large as in my previous test, the straight setDF method is significantly faster at all levels of the distribution of runs with max(setDF) < min(split) and the attr method is typically more than twice as fast.

A fourth method is the extreme champion, which is a simple nested lapply, returning a nested list. This method exemplifies the cost of constructing a data.frame from a list. Moreover, all methods I tried with the data.frame function were roughly an order of magnitude slower than the data.table techniques.

data

dat <- vector("list", 50)
for(i in 1:50) dat[[i]] <- iris
dat <- setDF(rbindlist(dat))

Solution 7 - List

Seems a current version of the purrr (0.2.2) package is the fastest solution:

by_row(x, function(v) list(v)[[1L]], .collate = "list")$.out

Let's compare the most interesting solutions:

data("Batting", package = "Lahman")
x <- Batting[1:10000, 1:10]
library(benchr)
library(purrr)
benchmark(
    split = split(x, seq_len(.row_names_info(x, 2L))),
    mapply = .mapply(function(...) structure(list(...), class = "data.frame", row.names = 1L), x, NULL),
    purrr = by_row(x, function(v) list(v)[[1L]], .collate = "list")$.out
)

Rsults:

Benchmark summary:
Time units : milliseconds 
  expr n.eval   min  lw.qu median   mean  up.qu  max  total relative
 split    100 983.0 1060.0 1130.0 1130.0 1180.0 1450 113000     34.3
mapply    100 826.0  894.0  963.0  972.0 1030.0 1320  97200     29.3
 purrr    100  24.1   28.6   32.9   44.9   40.5  183   4490      1.0

Also we can get the same result with Rcpp:

#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
List df2list(const DataFrame& x) {
    std::size_t nrows = x.rows();
    std::size_t ncols = x.cols();
    CharacterVector nms = x.names();
    List res(no_init(nrows));
    for (std::size_t i = 0; i < nrows; ++i) {
        List tmp(no_init(ncols));
        for (std::size_t j = 0; j < ncols; ++j) {
            switch(TYPEOF(x[j])) {
                case INTSXP: {
                    if (Rf_isFactor(x[j])) {
                        IntegerVector t = as<IntegerVector>(x[j]);
                        RObject t2 = wrap(t[i]);
                        t2.attr("class") = "factor";
                        t2.attr("levels") = t.attr("levels");
                        tmp[j] = t2;
                    } else {
                        tmp[j] = as<IntegerVector>(x[j])[i];
                    }
                    break;
                }
                case LGLSXP: {
                    tmp[j] = as<LogicalVector>(x[j])[i];
                    break;
                }
                case CPLXSXP: {
                    tmp[j] = as<ComplexVector>(x[j])[i];
                    break;
                }
                case REALSXP: {
                    tmp[j] = as<NumericVector>(x[j])[i];
                    break;
                }
                case STRSXP: {
                    tmp[j] = as<std::string>(as<CharacterVector>(x[j])[i]);
                    break;
                }
                default: stop("Unsupported type '%s'.", type2name(x));
            }
        }
        tmp.attr("class") = "data.frame";
        tmp.attr("row.names") = 1;
        tmp.attr("names") = nms;
        res[i] = tmp;
    }
    res.attr("names") = x.attr("row.names");
    return res;
}

Now caompare with purrr:

benchmark(
    purrr = by_row(x, function(v) list(v)[[1L]], .collate = "list")$.out,
    rcpp = df2list(x)
)

Results:

Benchmark summary:
Time units : milliseconds 
 expr n.eval  min lw.qu median mean up.qu   max total relative
purrr    100 25.2  29.8   37.5 43.4  44.2 159.0  4340      1.1
 rcpp    100 19.0  27.9   34.3 35.8  37.2  93.8  3580      1.0

Solution 8 - List

The best way for me was:

Example data:

Var1<-c("X1",X2","X3")
Var2<-c("X1",X2","X3")
Var3<-c("X1",X2","X3")

Data<-cbind(Var1,Var2,Var3)

ID    Var1   Var2  Var3 
1      X1     X2    X3
2      X4     X5    X6
3      X7     X8    X9

We call the BBmisc library

library(BBmisc)

data$lists<-convertRowsToList(data[,2:4])

And the result will be:

ID    Var1   Var2  Var3  lists
1      X1     X2    X3   list("X1", "X2", X3") 
2      X4     X5    X6   list("X4","X5", "X6") 
3      X7     X8    X9   list("X7,"X8,"X9) 

Solution 9 - List

Like @flodel wrote: This converts your dataframe into a list that has the same number of elements as number of rows in dataframe:

NewList <- split(df, f = seq(nrow(df)))

You can additionaly add a function to select only those columns that are not NA in each element of the list:

NewList2 <- lapply(NewList, function(x) x[,!is.na(x)])

Solution 10 - List

An alternative way is to convert the df to a matrix then applying the list apply lappy function over it: ldf <- lapply(as.matrix(myDF), function(x)x)

Solution 11 - List

Another alternative using library(purrr) (that seems to be a bit quicker on large data.frames)

flatten(by_row(xy.df, ..f = function(x) flatten_chr(x), .labels = FALSE))

Solution 12 - List

The by_row function from the purrrlyr package will do this for you.

This example demonstrates

myfn <- function(row) {
  #row is a tibble with one row, and the same number of columns as the original df
  l <- as.list(row)
  return(l)
}
  
list_of_lists <- purrrlyr::by_row(df, myfn, .labels=FALSE)$.out

By default, the returned value from myfn is put into a new list column in the df called .out. The $.out at the end of the above statement immediately selects this column, returning a list of lists.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionRoman LuštrikView Question on Stackoverflow
Solution 1 - ListflodelView Answer on Stackoverflow
Solution 2 - ListRoman LuštrikView Answer on Stackoverflow
Solution 3 - ListQiou BiView Answer on Stackoverflow
Solution 4 - ListMike StanleyView Answer on Stackoverflow
Solution 5 - ListRonak ShahView Answer on Stackoverflow
Solution 6 - ListlmoView Answer on Stackoverflow
Solution 7 - ListArtem KlevtsovView Answer on Stackoverflow
Solution 8 - ListRRuizView Answer on Stackoverflow
Solution 9 - ListmichalView Answer on Stackoverflow
Solution 10 - Listuser3553260View Answer on Stackoverflow
Solution 11 - ListMrHopkoView Answer on Stackoverflow
Solution 12 - ListRobinLView Answer on Stackoverflow