Is there a common Java utility to break a list into batches?

JavaCollections

Java Problem Overview


I wrote myself a utility to break a list into batches of given size. I just wanted to know if there is already any apache commons util for this.

public static <T> List<List<T>> getBatches(List<T> collection,int batchSize){
	int i = 0;
	List<List<T>> batches = new ArrayList<List<T>>();
	while(i<collection.size()){
		int nextInc = Math.min(collection.size()-i,batchSize);
		List<T> batch = collection.subList(i,i+nextInc);
		batches.add(batch);
		i = i + nextInc;
	}
	
	return batches;
}

Please let me know if there any existing utility already for the same.

Java Solutions


Solution 1 - Java

Check out https://google.github.io/guava/releases/19.0/api/docs/com/google/common/collect/Lists.html#partition(java.util.List, int)"> Lists.partition(java.util.List, int) from Google Guava:

> Returns consecutive sublists of a list, each of the same size (the final list may be smaller). For example, partitioning a list containing [a, b, c, d, e] with a partition size of 3 yields [[a, b, c], [d, e]] -- an outer list containing two inner lists of three and two elements, all in the original order.

Solution 2 - Java

In case you want to produce a Java-8 stream of batches, you can try the following code:

public static <T> Stream<List<T>> batches(List<T> source, int length) {
    if (length <= 0)
        throw new IllegalArgumentException("length = " + length);
    int size = source.size();
    if (size <= 0)
        return Stream.empty();
    int fullChunks = (size - 1) / length;
    return IntStream.range(0, fullChunks + 1).mapToObj(
        n -> source.subList(n * length, n == fullChunks ? size : (n + 1) * length));
}

public static void main(String[] args) {
	List<Integer> list = Arrays.asList(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14);

	System.out.println("By 3:");
	batches(list, 3).forEach(System.out::println);
	
	System.out.println("By 4:");
	batches(list, 4).forEach(System.out::println);
}

Output:

By 3:
[1, 2, 3]
[4, 5, 6]
[7, 8, 9]
[10, 11, 12]
[13, 14]
By 4:
[1, 2, 3, 4]
[5, 6, 7, 8]
[9, 10, 11, 12]
[13, 14]

Solution 3 - Java

Use Apache Commons ListUtils.partition.

org.apache.commons.collections4.ListUtils.partition(final List<T> list, final int size)

Solution 4 - Java

Another approach is to use Collectors.groupingBy of indices and then map the grouped indices to the actual elements:

    final List<Integer> numbers = range(1, 12)
            .boxed()
            .collect(toList());
    System.out.println(numbers);

    final List<List<Integer>> groups = range(0, numbers.size())
            .boxed()
            .collect(groupingBy(index -> index / 4))
            .values()
            .stream()
            .map(indices -> indices
                    .stream()
                    .map(numbers::get)
                    .collect(toList()))
            .collect(toList());
    System.out.println(groups);

Output:

>[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] > >[[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11]]

Solution 5 - Java

With Java 9 you can use IntStream.iterate() with hasNext condition. So you can simplify the code of your method to this:

public static <T> List<List<T>> getBatches(List<T> collection, int batchSize) {
    return IntStream.iterate(0, i -> i < collection.size(), i -> i + batchSize)
            .mapToObj(i -> collection.subList(i, Math.min(i + batchSize, collection.size())))
            .collect(Collectors.toList());
}

Using {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}, the result of getBatches(numbers, 4) will be:

[[0, 1, 2, 3], [4, 5, 6, 7], [8, 9]]

Solution 6 - Java

Here is a simple solution for Java 8+:

public static <T> Collection<List<T>> prepareChunks(List<T> inputList, int chunkSize) {
	AtomicInteger counter = new AtomicInteger();
	return inputList.stream().collect(Collectors.groupingBy(it -> counter.getAndIncrement() / chunkSize)).values();
}

Solution 7 - Java

I came up with this one:

private static <T> List<List<T>> partition(Collection<T> members, int maxSize)
{
    List<List<T>> res = new ArrayList<>();

    List<T> internal = new ArrayList<>();

    for (T member : members)
    {
        internal.add(member);

        if (internal.size() == maxSize)
        {
            res.add(internal);
            internal = new ArrayList<>();
        }
    }
    if (internal.isEmpty() == false)
    {
        res.add(internal);
    }
    return res;
}

Solution 8 - Java

The following example demonstrates chunking of a List:

package de.thomasdarimont.labs;

import java.util.ArrayList;
import java.util.Arrays;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

public class SplitIntoChunks {

	public static void main(String[] args) {

		List<Integer> ints = Arrays.asList(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11);

		List<List<Integer>> chunks = chunk(ints, 4);

		System.out.printf("Ints:   %s%n", ints);
		System.out.printf("Chunks: %s%n", chunks);
	}

	public static <T> List<List<T>> chunk(List<T> input, int chunkSize) {

		int inputSize = input.size();
		int chunkCount = (int) Math.ceil(inputSize / (double) chunkSize);

		Map<Integer, List<T>> map = new HashMap<>(chunkCount);
		List<List<T>> chunks = new ArrayList<>(chunkCount);

		for (int i = 0; i < inputSize; i++) {

			map.computeIfAbsent(i / chunkSize, (ignore) -> {

				List<T> chunk = new ArrayList<>();
				chunks.add(chunk);
				return chunk;

			}).add(input.get(i));
		}

		return chunks;
	}
}

Output:

Ints:   [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
Chunks: [[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11]]

Solution 9 - Java

Here an example:

final AtomicInteger counter = new AtomicInteger();
final int partitionSize=3;
final List<Object> list=new ArrayList<>();
            list.add("A");
            list.add("B");
            list.add("C");
            list.add("D");
            list.add("E");
       
        
final Collection<List<Object>> subLists=list.stream().collect(Collectors.groupingBy
                (it->counter.getAndIncrement() / partitionSize))
                .values();
        System.out.println(subLists);

Input: [A, B, C, D, E]

Output: [[A, B, C], [D, E]]

You can find examples here: https://e.printstacktrace.blog/divide-a-list-to-lists-of-n-size-in-Java-8/

Solution 10 - Java

There was another question that was closed as being a duplicate of this one, but if you read it closely, it's subtly different. So in case someone (like me) actually wants to split a list into a given number of almost equally sized sublists, then read on.

I simply ported the algorithm described here to Java.

@Test
public void shouldPartitionListIntoAlmostEquallySizedSublists() {

    List<String> list = Arrays.asList("a", "b", "c", "d", "e", "f", "g");
    int numberOfPartitions = 3;

    List<List<String>> split = IntStream.range(0, numberOfPartitions).boxed()
            .map(i -> list.subList(
                    partitionOffset(list.size(), numberOfPartitions, i),
                    partitionOffset(list.size(), numberOfPartitions, i + 1)))
            .collect(toList());

    assertThat(split, hasSize(numberOfPartitions));
    assertEquals(list.size(), split.stream().flatMap(Collection::stream).count());
    assertThat(split, hasItems(Arrays.asList("a", "b", "c"), Arrays.asList("d", "e"), Arrays.asList("f", "g")));
}

private static int partitionOffset(int length, int numberOfPartitions, int partitionIndex) {
    return partitionIndex * (length / numberOfPartitions) + Math.min(partitionIndex, length % numberOfPartitions);
}

Solution 11 - Java

Using various cheats from the web, I came to this solution:

int[] count = new int[1];
final int CHUNK_SIZE = 500;
Map<Integer, List<Long>> chunkedUsers = users.stream().collect( Collectors.groupingBy( 
    user -> {
		count[0]++;
		return Math.floorDiv( count[0], CHUNK_SIZE );
    } )
);

We use count to mimic a normal collection index.
Then, we group the collection elements in buckets, using the algebraic quotient as bucket number.
The final map contains as key the bucket number, as value the bucket itself.

You can then easily do an operation on each of the buckets with:

chunkedUsers.values().forEach( ... );

Solution 12 - Java

Similar to OP without streams and libs, but conciser:

public <T> List<List<T>> getBatches(List<T> collection, int batchSize) {
    List<List<T>> batches = new ArrayList<>();
    for (int i = 0; i < collection.size(); i += batchSize) {
        batches.add(collection.subList(i, Math.min(i + batchSize, collection.size())));
    }
    return batches;
}

Solution 13 - Java

List<T> batch = collection.subList(i,i+nextInc);
->
List<T> batch = collection.subList(i, i = i + nextInc);

Solution 14 - Java

Note that List#subList() returns a view of the underlying collection, which can result in unexpected consequences when editing the smaller lists - the edits will reflect in the original collection or may throw ConcurrentModificationException.

Solution 15 - Java

Another approach to solve this, question:

public class CollectionUtils {

    /**
    * Splits the collection into lists with given batch size
    * @param collection to split in to batches
    * @param batchsize size of the batch
    * @param <T> it maintains the input type to output type
    * @return nested list
    */
    public static <T> List<List<T>> makeBatch(Collection<T> collection, int batchsize) {

        List<List<T>> totalArrayList = new ArrayList<>();
        List<T> tempItems = new ArrayList<>();

        Iterator<T> iterator = collection.iterator();

        for (int i = 0; i < collection.size(); i++) {
            tempItems.add(iterator.next());
            if ((i+1) % batchsize == 0) {
                totalArrayList.add(tempItems);
                tempItems = new ArrayList<>();
            }
        }

        if (tempItems.size() > 0) {
            totalArrayList.add(tempItems);
        }

        return totalArrayList;
    }

}

Solution 16 - Java

A one-liner in Java 8 would be:

import static java.util.function.Function.identity;
import static java.util.stream.Collectors.*;

private static <T> Collection<List<T>> partition(List<T> xs, int size) {
    return IntStream.range(0, xs.size())
            .boxed()
            .collect(collectingAndThen(toMap(identity(), xs::get), Map::entrySet))
            .stream()
            .collect(groupingBy(x -> x.getKey() / size, mapping(Map.Entry::getValue, toList())))
            .values();

}

Solution 17 - Java

You can use below code to get the batch of list.

Iterable<List<T>> batchIds = Iterables.partition(list, batchSize);

You need to import Google Guava library to use above code.

Solution 18 - Java

Here's a solution using vanilla java and the super secret modulo operator :)

Given the content/order of the chunks doesn't matter, this would be the easiest approach. (When preparing stuff for multi-threading it usually doesn't matter, which elements are processed on which thread for example, just need an equal distribution).

public static <T> List<T>[] chunk(List<T> input, int chunkCount) {
	List<T>[] chunks = new List[chunkCount];

	for (int i = 0; i < chunkCount; i++) {
		chunks[i] = new LinkedList<T>();
	}

	for (int i = 0; i < input.size(); i++) {
		chunks[i % chunkCount].add(input.get(i));
	}

	return chunks;
}

Usage:

	List<String> list = Arrays.asList("a", "b", "c", "d", "e", "f", "g", "h", "i", "j");

	List<String>[] chunks = chunk(list, 4);

	for (List<String> chunk : chunks) {
		System.out.println(chunk);
	}

Output:

[a, e, i]
[b, f, j]
[c, g]
[d, h]

Solution 19 - Java

Below solution using Java 8 Streams:

		//Sample Input
		List<String> input = new ArrayList<String>();
		IntStream.range(1,999).forEach((num) -> {
			input.add(""+num);
		});
		
		//Identify no. of batches
		int BATCH_SIZE = 10;
		int multiples = input.size() /  BATCH_SIZE;
		if(input.size()%BATCH_SIZE!=0) {
			multiples = multiples + 1;
		}
		
		//Process each batch
		IntStream.range(0, multiples).forEach((indx)->{
			List<String> batch = input.stream().skip(indx * BATCH_SIZE).limit(BATCH_SIZE).collect(Collectors.toList());
			System.out.println("Batch Items:"+batch);
		});

Solution 20 - Java

import com.google.common.collect.Lists;

List<List<T>> batches = Lists.partition(List<T>,batchSize)

Use Lists.partition(List,batchSize). You need to import Lists from google common package (com.google.common.collect.Lists)

It will return List of List<T> with and the size of every element equal to your batchSize.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionHarishView Question on Stackoverflow
Solution 1 - JavaTomasz NurkiewiczView Answer on Stackoverflow
Solution 2 - JavaTagir ValeevView Answer on Stackoverflow
Solution 3 - JavaPaul RambagsView Answer on Stackoverflow
Solution 4 - JavaAdrian BonaView Answer on Stackoverflow
Solution 5 - JavaSamuel PhilippView Answer on Stackoverflow
Solution 6 - JavaatLeastDView Answer on Stackoverflow
Solution 7 - JavaRaz CorenView Answer on Stackoverflow
Solution 8 - JavaThomas DarimontView Answer on Stackoverflow
Solution 9 - JavaSahar PkView Answer on Stackoverflow
Solution 10 - JavaStefan ReisnerView Answer on Stackoverflow
Solution 11 - JavaNicolas NobelisView Answer on Stackoverflow
Solution 12 - JavaAlbert HendriksView Answer on Stackoverflow
Solution 13 - JavaYohannView Answer on Stackoverflow
Solution 14 - JavaNetherView Answer on Stackoverflow
Solution 15 - JavaJurrian FahnerView Answer on Stackoverflow
Solution 16 - JavaOri PopowskiView Answer on Stackoverflow
Solution 17 - Javaranjanm28View Answer on Stackoverflow
Solution 18 - JavadognoseView Answer on Stackoverflow
Solution 19 - JavariteshkView Answer on Stackoverflow
Solution 20 - Javav87278View Answer on Stackoverflow