Flattening a collection

JavaCollections

Java Problem Overview


Say I have a Map<? extends Object, List<String>>

I can get the values of the map easily enough, and iterate over it to produce a single List<String>.

   for (List<String> list : someMap.values()) {
        someList.addAll(list);
    }

Is there a way to flatten it in one shot?

  List<String> someList = SomeMap.values().flatten();

Java Solutions


Solution 1 - Java

Using Java 8 and if you prefer not to instantiate a List instance by yourself, like in the suggested (and accepted) solution

someMap.values().forEach(someList::addAll);

You could do it all by streaming with this statement:

List<String> someList = map.values().stream().flatMap(c -> c.stream()).collect(Collectors.toList());

By the way it should be interesting to know, that on Java 8 the accepted version seems to be indeed the fastest. It has about the same timing as a

for (List<String> item : someMap.values()) ...

and is a way faster than the pure streaming solution. Here is my little testcode. I explicitly don't name it benchmark to avoid the resulting discussion of benchmark flaws. ;) I do every test twice to hopefully get a full compiled version.

    Map<String, List<String>> map = new HashMap<>();
	long millis;

	map.put("test", Arrays.asList("1", "2", "3", "4"));
	map.put("test2", Arrays.asList("10", "20", "30", "40"));
	map.put("test3", Arrays.asList("100", "200", "300", "400"));

	int maxcounter = 1000000;
	
	System.out.println("1 stream flatmap");
	millis = System.currentTimeMillis();
	for (int i = 0; i < maxcounter; i++) {
		List<String> someList = map.values().stream().flatMap(c -> c.stream()).collect(Collectors.toList());
	}
	System.out.println(System.currentTimeMillis() - millis);
	
	System.out.println("1 parallel stream flatmap");
	millis = System.currentTimeMillis();
	for (int i = 0; i < maxcounter; i++) {
		List<String> someList = map.values().parallelStream().flatMap(c -> c.stream()).collect(Collectors.toList());
	}
	System.out.println(System.currentTimeMillis() - millis);

	System.out.println("1 foreach");
	millis = System.currentTimeMillis();
	for (int i = 0; i < maxcounter; i++) {
		List<String> mylist = new ArrayList<String>();
		map.values().forEach(mylist::addAll);
	}
	System.out.println(System.currentTimeMillis() - millis);		

	System.out.println("1 for");
	millis = System.currentTimeMillis();
	for (int i = 0; i < maxcounter; i++) {
		List<String> mylist = new ArrayList<String>();
		for (List<String> item : map.values()) {
			mylist.addAll(item);
		}
	}
	System.out.println(System.currentTimeMillis() - millis);
	
	
	System.out.println("2 stream flatmap");
	millis = System.currentTimeMillis();
	for (int i = 0; i < maxcounter; i++) {
		List<String> someList = map.values().stream().flatMap(c -> c.stream()).collect(Collectors.toList());
	}
	System.out.println(System.currentTimeMillis() - millis);
	
	System.out.println("2 parallel stream flatmap");
	millis = System.currentTimeMillis();
	for (int i = 0; i < maxcounter; i++) {
		List<String> someList = map.values().parallelStream().flatMap(c -> c.stream()).collect(Collectors.toList());
	}
	System.out.println(System.currentTimeMillis() - millis);
	
	System.out.println("2 foreach");
	millis = System.currentTimeMillis();
	for (int i = 0; i < maxcounter; i++) {
		List<String> mylist = new ArrayList<String>();
		map.values().forEach(mylist::addAll);
	}
	System.out.println(System.currentTimeMillis() - millis);		

	System.out.println("2 for");
	millis = System.currentTimeMillis();
	for (int i = 0; i < maxcounter; i++) {
		List<String> mylist = new ArrayList<String>();
		for (List<String> item : map.values()) {
			mylist.addAll(item);
		}
	}
	System.out.println(System.currentTimeMillis() - millis);

And here are the results:

1 stream flatmap
468
1 parallel stream flatmap
1529
1 foreach
140
1 for
172
2 stream flatmap
296
2 parallel stream flatmap
1482
2 foreach
156
2 for
141

Edit 2016-05-24 (two years after):

Running the same test using an actual Java 8 version (U92) on the same machine:

1 stream flatmap
313
1 parallel stream flatmap
3257
1 foreach
109
1 for
141
2 stream flatmap
219
2 parallel stream flatmap
3830
2 foreach
125
2 for
140

It seems that there is a speedup for sequential processing of streams and an even larger overhead for parallel streams.

Edit 2018-10-18 (four years after):

Using now Java 10 version (10.0.2) on the same machine:

1 stream flatmap
393
1 parallel stream flatmap
3683
1 foreach
157
1 for
175
2 stream flatmap
243
2 parallel stream flatmap
5945
2 foreach
128
2 for
187

The overhead for parallel streaming seems to be larger.

Edit 2020-05-22 (six years after):

Using now Java 14 version (14.0.0.36) on a different machine:

1 stream flatmap
299
1 parallel stream flatmap
3209
1 foreach
202
1 for
170
2 stream flatmap
178
2 parallel stream flatmap
3270
2 foreach
138
2 for
167

It should really be noted, that this was done on a different machine (but I think comparable). The parallel streaming overhead seems to be considerably smaller than before.

Solution 2 - Java

If you are using Java 8, you could do something like this:

someMap.values().forEach(someList::addAll);

Solution 3 - Java

When searching for "java 8 flatten" this is the only mentioning. And it's not about flattening stream either. So for great good I just leave it here

.flatMap(Collection::stream)

I'm also surprised no one has given concurrent java 8 answer to original question which is

.collect(ArrayList::new, ArrayList::addAll, ArrayList::addAll);

Solution 4 - Java

Suggested by a colleague:

listOfLists.stream().flatMap(e -> e.stream()).collect(Lists.toList())

I like it better than forEach().

Solution 5 - Java

If you're using Eclipse Collections, you can use Iterate.flatten().

MutableMap<String, MutableList<String>> map = Maps.mutable.empty();
map.put("Even", Lists.mutable.with("0", "2", "4"));
map.put("Odd", Lists.mutable.with("1", "3", "5"));
MutableList<String> flattened = Iterate.flatten(map, Lists.mutable.empty());
Assert.assertEquals(
    Lists.immutable.with("0", "1", "2", "3", "4", "5"),
    flattened.toSortedList());

flatten() is a special case of the more general RichIterable.flatCollect().

MutableList<String> flattened = 
    map.flatCollect(x -> x, Lists.mutable.empty());

Note: I am a committer for Eclipse Collections.

Solution 6 - Java

No, there is no shorter method. You have to use a loop.

Update Apr 2014: Java 8 has finally come out. In the new version you can use the Iterable.forEach method to walk over a collection without using an explicit loop.

Update Nov 2017: Found this question by chance when looking for a modern solution. Ended up going with reduce:

someMap.values().stream().reduce(new ArrayList(), (accum, list) -> {
    accum.addAll(list);
    return accum;
}):

This avoids depending on mutable external state of forEach(someList::addAll) the overhead of flatMap(List::stream).

Solution 7 - Java

If you just want to iterate through values, you can avoid all these addAll methods.

All you have to do is write a class that encapsulates your Map, and that implements the Iterator :

public class ListMap<K,V> implements Iterator<V>
{
  private final Map<K,List<V>> _map;
  private Iterator<Map.Entry<K,List<V>>> _it1 = null;
  private Iterator<V> _it2 = null;

  public ListMap(Map<K,List<V>> map)
  {
    _map = map;
    _it1 = map.entrySet().iterator(); 
    nextList();
  }

  public boolean hasNext()
  {
    return _it2!=null && _it2.hasNext();
  }

  public V next()
  {
    if(_it2!=null && _it2.hasNext())
    {
      return _it2.next();
    }
    else
    {
      throw new NoSuchElementException();
    }
    nextList();
  } 

  public void remove()
  {
    throw new NotImplementedException();
  }

  private void nextList()
  {
    while(_it1.hasNext() && !_it2.hasNext())
    {
      _it2 = _it1.next().value();
    }
  }
}

Solution 8 - Java

A nice solution for the subcase of a Map of Maps is to store, if possible, the data in Guava's Table.

https://github.com/google/guava/wiki/NewCollectionTypesExplained#table

So for instance a Map<String,Map<String,String>> is replaced by Table<String,String,String> which is already flattend. In fact, the docs say that HashBasedTable, Table's Hash implementation, is essentially backed by a HashMap<R, HashMap<C, V>>

Solution 9 - Java

Flatten on a function:

    private <A, T> List<T> flatten(List<A> list, Function<A, List<T>> flattenFn) {
        return list
                .stream()
                .map(flattenFn)
                .flatMap(Collection::stream)
                .collect(Collectors.toUnmodifiableList());
    }

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionTony EnnisView Question on Stackoverflow
Solution 1 - JavawumpzView Answer on Stackoverflow
Solution 2 - JavaJosh MView Answer on Stackoverflow
Solution 3 - Javauser2418306View Answer on Stackoverflow
Solution 4 - JavaorbfishView Answer on Stackoverflow
Solution 5 - JavaCraig P. MotlinView Answer on Stackoverflow
Solution 6 - JavaJoniView Answer on Stackoverflow
Solution 7 - JavaDavidView Answer on Stackoverflow
Solution 8 - JavaGuy GrinView Answer on Stackoverflow
Solution 9 - JavaLeo DuarteView Answer on Stackoverflow