How to convert a list consisting of vector of different lengths to a usable data frame in R?
RVectorDataframeR Problem Overview
I have a (fairly long) list of vectors. The vectors consist of Russian words that I got by using the strsplit()
function on sentences.
The following is what head()
returns:
[[1]]
[1] "модно" "создавать" "резюме" "в" "виде"
[[2]]
[1] "ты" "начианешь" "работать" "с" "этими"
[[3]]
[1] "модно" "называть" "блогер-рилейшенз" "―" "начинается" "задолго"
[[4]]
[1] "видел" "по" "сыну," "что" "он"
[[5]]
[1] "четырнадцать," "я" "поселился" "на" "улице"
[[6]]
[1] "широко" "продолжали" "род."
Note the vectors are of different length.
What I want is to be able to read the first words from each sentence, the second word, the third, etc.
The desired result would be something like this:
P1 P2 P3 P4 P5 P6
[1] "модно" "создавать" "резюме" "в" "виде" NA
[2] "ты" "начианешь" "работать" "с" "этими" NA
[3] "модно" "называть" "блогер-рилейшенз" "―" "начинается" "задолго"
[4] "видел" "по" "сыну," "что" "он" NA
[5] "четырнадцать," "я" "поселился" "на" "улице" NA
[6] "широко" "продолжали" "род." NA NA NA
I have tried to just use data.frame()
but that didn't work because the rows are of different length. I also tried rbind.fill()
from the plyr
package, but that function can only process matrices.
I found some other questions here (that's where I got the plyr
help from), but those were all about combining for instance two data frames of different size.
Thanks for your help.
R Solutions
Solution 1 - R
One liner with plyr
plyr::ldply(word.list, rbind)
Solution 2 - R
try this:
word.list <- list(letters[1:4], letters[1:5], letters[1:2], letters[1:6])
n.obs <- sapply(word.list, length)
seq.max <- seq_len(max(n.obs))
mat <- t(sapply(word.list, "[", i = seq.max))
the trick is, that,
c(1:2)[1:4]
returns the vector + two NAs
Solution 3 - R
Another option is stri_list2matrix
from library(stringi)
library(stringi)
stri_list2matrix(l, byrow=TRUE)
# [,1] [,2] [,3] [,4]
#[1,] "a" "b" "c" NA
#[2,] "a2" "b2" NA NA
#[3,] "a3" "b3" "c3" "d3"
NOTE: Data from @juba's post.
Or as @Valentin mentioned in the comments
sapply(l, "length<-", max(lengths(l)))
Solution 4 - R
You can do something like this :
## Example data
l <- list(c("a","b","c"), c("a2","b2"), c("a3","b3","c3","d3"))
## Compute maximum length
max.length <- max(sapply(l, length))
## Add NA values to list elements
l <- lapply(l, function(v) { c(v, rep(NA, max.length-length(v)))})
## Rbind
do.call(rbind, l)
Which gives :
[,1] [,2] [,3] [,4]
[1,] "a" "b" "c" NA
[2,] "a2" "b2" NA NA
[3,] "a3" "b3" "c3" "d3"
Solution 5 - R
You could also use rbindlist()
from the data.table package.
Convert vectors to data.table
s or data.frame
s and transpose them (not sure if this reduces speed a lot) with the help of lapply()
. Then bind them with rbindlist()
- filling missing cells with NA.
require(data.table)
l = list(c("a","b","c"), c("a2","b2"), c("a3","b3","c3","d3"))
dt = rbindlist(
lapply(l, function(x) data.table(t(x))),
fill = TRUE
)
Solution 6 - R
Another option could be to define a function like this (it'd mimic rbind.fill
) or use it directly from rowr
package:
cbind.fill <- function(...){
nm <- list(...)
nm <- lapply(nm, as.matrix)
n <- max(sapply(nm, nrow))
do.call(cbind, lapply(nm, function (x)
rbind(x, matrix(, n-nrow(x), ncol(x)))))
}
This response is taken from here (and there're some usage examples).