How to flatten a list to a list without coercion?

ListRStandard LibraryFlattenType Coercion

List Problem Overview


I am trying to achieve the functionality similar to unlist, with the exception that types are not coerced to a vector, but the list with preserved types is returned instead. For instance:

flatten(list(NA, list("TRUE", list(FALSE), 0L))

should return

list(NA, "TRUE", FALSE, 0L)

instead of

c(NA, "TRUE", "FALSE", "0")

which would be returned by unlist(list(list(NA, list("TRUE", list(FALSE), 0L)).

As it is seen from the example above, the flattening should be recursive. Is there a function in standard R library which achieves this, or at least some other function which can be used to easily and efficiently implement this?

UPDATE: I don't know if it is clear from the above, but non-lists should not be flattened, i.e. flatten(list(1:3, list(4, 5))) should return list(c(1, 2, 3), 4, 5).

List Solutions


Solution 1 - List

Interesting non-trivial problem!

MAJOR UPDATE With all that's happened, I've rewrote the answer and removed some dead ends. I also timed the various solutions on different cases.

Here's the first, rather simple but slow, solution:

flatten1 <- function(x) {
  y <- list()
  rapply(x, function(x) y <<- c(y,x))
  y
}

rapply lets you traverse a list and apply a function on each leaf element. Unfortunately, it works exactly as unlist with the returned values. So I ignore the result from rapply and instead I append values to the variable y by doing <<-.

Growing y in this manner is not very efficient (it's quadratic in time). So if there are many thousands of elements this will be very slow.

A more efficient approach is the following, with simplifications from @JoshuaUlrich:

flatten2 <- function(x) {
  len <- sum(rapply(x, function(x) 1L))
  y <- vector('list', len)
  i <- 0L
  rapply(x, function(x) { i <<- i+1L; y[[i]] <<- x })
  y
}

Here I first find out the result length and pre-allocate the vector. Then I fill in the values. As you can will see, this solution is much faster.

Here's a version of @JoshO'Brien great solution based on Reduce, but extended so it handles arbitrary depth:

flatten3 <- function(x) {
  repeat {
    if(!any(vapply(x, is.list, logical(1)))) return(x)
    x <- Reduce(c, x)
  }
}

Now let the battle begin!

# Check correctness on original problem 
x <- list(NA, list("TRUE", list(FALSE), 0L))
dput( flatten1(x) )
#list(NA, "TRUE", FALSE, 0L)
dput( flatten2(x) )
#list(NA, "TRUE", FALSE, 0L)
dput( flatten3(x) )
#list(NA_character_, "TRUE", FALSE, 0L)

# Time on a huge flat list
x <- as.list(1:1e5)
#system.time( flatten1(x) )  # Long time
system.time( flatten2(x) )  # 0.39 secs
system.time( flatten3(x) )  # 0.04 secs

# Time on a huge deep list
x <-'leaf'; for(i in 1:11) { x <- list(left=x, right=x, value=i) }
#system.time( flatten1(x) ) # Long time
system.time( flatten2(x) )  # 0.05 secs
system.time( flatten3(x) )  # 1.28 secs

...So what we observe is that the Reduce solution is faster when the depth is low, and the rapply solution is faster when the depth is large!

As correctness goes, here are some tests:

> dput(flatten1( list(1:3, list(1:3, 'foo')) ))
list(1L, 2L, 3L, 1L, 2L, 3L, "foo")
> dput(flatten2( list(1:3, list(1:3, 'foo')) ))
list(1:3, 1:3, "foo")
> dput(flatten3( list(1:3, list(1:3, 'foo')) ))
list(1L, 2L, 3L, 1:3, "foo")

Unclear what result is desired, but I lean towards the result from flatten2...

Solution 2 - List

For lists that are only a few nestings deep, you could use Reduce() and c() to do something like the following. Each application of c() removes one level of nesting. (For fully general solution, see EDITs below.)

L <- (list(NA, list("TRUE", list(FALSE), 0L)))
Reduce(c, Reduce(c, L))
[[1]]
[1] NA

[[2]]
[1] "TRUE"

[[3]]
[1] FALSE

[[4]]
[1] 0



# TIMING TEST
x <- as.list(1:4e3)
system.time(flatten(x))   # Using the improved version    
# user  system elapsed 
# 0.14    0.00    0.13 
system.time(Reduce(c, x))
# user  system elapsed 
# 0.04    0.00    0.03 

EDIT Just for fun, here's a version of @Tommy's version of @JoshO'Brien's solution that does work for already flat lists. FURTHER EDIT Now @Tommy's solved that problem as well, but in a cleaner way. I'll leave this version in place.

flatten <- function(x) {
    x <- list(x)
    repeat {
        x <- Reduce(c, x)
        if(!any(vapply(x, is.list, logical(1)))) return(x)
    }
}

flatten(list(3, TRUE, 'foo'))
# [[1]]
# [1] 3
# 
# [[2]]
# [1] TRUE
# 
# [[3]]
# [1] "foo"

Solution 3 - List

How about this? It builds off Josh O'Brien's solution but does the recursion with a while loop instead using unlist with recursive=FALSE.

flatten4 <- function(x) {
  while(any(vapply(x, is.list, logical(1)))) { 
    # this next line gives behavior like Tommy's answer; 
    # removing it gives behavior like Josh's
    x <- lapply(x, function(x) if(is.list(x)) x else list(x))
    x <- unlist(x, recursive=FALSE) 
  }
  x
}

Keeping the commented line in gives results like this (which Tommy prefers, and so do I, for that matter).

> x <- list(1:3, list(1:3, 'foo'))
> dput(flatten4(x))
list(1:3, 1:3, "foo")

Output from my system, using Tommy's tests:

dput(flatten4(foo))
#list(NA, "TRUE", FALSE, 0L)

# Time on a long 
x <- as.list(1:1e5)
system.time( x2 <- flatten2(x) )  # 0.48 secs
system.time( x3 <- flatten3(x) )  # 0.07 secs
system.time( x4 <- flatten4(x) )  # 0.07 secs
identical(x2, x4) # TRUE
identical(x3, x4) # TRUE

# Time on a huge deep list
x <-'leaf'; for(i in 1:11) { x <- list(left=x, right=x, value=i) }
system.time( x2 <- flatten2(x) )  # 0.05 secs
system.time( x3 <- flatten3(x) )  # 1.45 secs
system.time( x4 <- flatten4(x) )  # 0.03 secs
identical(x2, unname(x4)) # TRUE
identical(unname(x3), unname(x4)) # TRUE

EDIT: As for getting the depth of a list, maybe something like this would work; it gets the index for each element recursively.

depth <- function(x) {
  foo <- function(x, i=NULL) {
    if(is.list(x)) { lapply(seq_along(x), function(xi) foo(x[[xi]], c(i,xi))) }
    else { i }
  }
  flatten4(foo(x))
}

It's not super fast but it seems to work fine.

x <- as.list(1:1e5)
system.time(d <- depth(x)) # 0.327 s

x <-'leaf'; for(i in 1:11) { x <- list(left=x, right=x, value=i) }
system.time(d <- depth(x)) # 0.041s

I'd imagined it being used this way:

> x[[ d[[5]] ]]
[1] "leaf"
> x[[ d[[6]] ]]
[1] 1
    

But you could also get a count of how many nodes are at each depth too.

> table(sapply(d, length))

   1    2    3    4    5    6    7    8    9   10   11 
   1    2    4    8   16   32   64  128  256  512 3072 

Solution 4 - List

Edited to address a flaw pointed out in the comments. Sadly, it just makes it even less efficient. Ah well.

Another approach, although I'm not sure it will be more efficient than anything @Tommy has suggested:

l <- list(NA, list("TRUE", list(FALSE), 0L))
 
flatten <- function(x){
    obj <- rapply(x,identity,how = "unlist")
    cl <- rapply(x,class,how = "unlist")
    len <- rapply(x,length,how = "unlist")
    cl <- rep(cl,times = len)
    mapply(function(obj,cl){rs <- as(obj,cl); rs}, obj, cl, 
        SIMPLIFY = FALSE, USE.NAMES = FALSE)
}

> flatten(l)
[[1]]
[1] NA

[[2]]
[1] "TRUE"

[[3]]
[1] FALSE

[[4]]
[1] 0

Solution 5 - List

purrr::flatten achieves that. Though it is not recursive (by design).

So applying it twice should work:

library(purrr)
l <- list(NA, list("TRUE", list(FALSE), 0L))
flatten(flatten(l))

Here is an attempt at a recursive version:

flatten_recursive <- function(x) {
  stopifnot(is.list(x))
  if (any(vapply(x, is.list, logical(1)))) Recall(purrr::flatten(x)) else x
}
flatten_recursive(l)

Solution 6 - List

hack_list <- function(.list) {
  .list[['_hack']] <- function() NULL
  .list <- unlist(.list)
  .list$`_hack` <- NULL
  .list
}

Solution 7 - List

You can also use rrapply in the rrapply-package (extended version of base-rapply) by setting how = "flatten":

library(rrapply)

rrapply(list(NA, list("TRUE", list(FALSE), 0L)), how = "flatten")
#> [[1]]
#> [1] NA
#> 
#> [[2]]
#> [1] "TRUE"
#> 
#> [[3]]
#> [1] FALSE
#> 
#> [[4]]
#> [1] 0

Computation times

Below are some benchmark timings against the flatten2 and flatten3 functions in Tommy's response for two large nested lists:

flatten2 <- function(x) {
  len <- sum(rapply(x, function(x) 1L))
  y <- vector('list', len)
  i <- 0L
  rapply(x, function(x) { i <<- i+1L; y[[i]] <<- x })
  y
}

flatten3 <- function(x) {
  repeat {
    if(!any(vapply(x, is.list, logical(1)))) return(x)
    x <- Reduce(c, x)
  }
}

## large deeply nested list (1E6 elements, 6 layers)
deep_list <- rrapply(replicate(10, 1, simplify = F), classes = c("list", "numeric"), condition = function(x, .xpos) length(.xpos) < 6, f = function(x) replicate(10, 1, simplify = F), how = "recurse")

system.time(flatten2(deep_list))
#>    user  system elapsed 
#>   1.715   0.012   1.727
## system.time(flatten3(deep_list)), not run takes more than 10 minutes
system.time(rrapply(deep_list, how = "flatten"))
#>    user  system elapsed 
#>   0.105   0.016   0.121

## large shallow nested list (1E6 elements, 2 layers)
shallow_list <- lapply(replicate(1000, 1, simplify = F), function(x) replicate(1000, 1, simplify = F))

system.time(flatten2(shallow_list))
#>    user  system elapsed 
#>   1.308   0.040   1.348
system.time(flatten3(shallow_list))
#>    user  system elapsed 
#>   5.246   0.012   5.259
system.time(rrapply(shallow_list, how = "flatten"))
#>    user  system elapsed 
#>    0.09    0.00    0.09

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestioneoldView Question on Stackoverflow
Solution 1 - ListTommyView Answer on Stackoverflow
Solution 2 - ListJosh O'BrienView Answer on Stackoverflow
Solution 3 - ListAaron left Stack OverflowView Answer on Stackoverflow
Solution 4 - ListjoranView Answer on Stackoverflow
Solution 5 - ListAurèleView Answer on Stackoverflow
Solution 6 - ListMullefaView Answer on Stackoverflow
Solution 7 - ListJoris C.View Answer on Stackoverflow