Iterate an iterator by chunks (of n) in Python?

PythonIterator

Python Problem Overview


Can you think of a nice way (maybe with itertools) to split an iterator into chunks of given size?

Therefore l=[1,2,3,4,5,6,7] with chunks(l,3) becomes an iterator [1,2,3], [4,5,6], [7]

I can think of a small program to do that but not a nice way with maybe itertools.

Python Solutions


Solution 1 - Python

The grouper() recipe from the itertools documentation's recipes comes close to what you want:

def grouper(n, iterable, fillvalue=None):
    "grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx"
    args = [iter(iterable)] * n
    return izip_longest(fillvalue=fillvalue, *args)

It will fill up the last chunk with a fill value, though.

A less general solution that only works on sequences but does handle the last chunk as desired is

[my_list[i:i + chunk_size] for i in range(0, len(my_list), chunk_size)]

Finally, a solution that works on general iterators and behaves as desired is

def grouper(n, iterable):
    it = iter(iterable)
    while True:
        chunk = tuple(itertools.islice(it, n))
        if not chunk:
            return
        yield chunk

Solution 2 - Python

Although OP asks function to return chunks as list or tuple, in case you need to return iterators, then Sven Marnach's solution can be modified:

def grouper_it(n, iterable):
    it = iter(iterable)
    while True:
        chunk_it = itertools.islice(it, n)
        try:
            first_el = next(chunk_it)
        except StopIteration:
            return
        yield itertools.chain((first_el,), chunk_it)

Some benchmarks: http://pastebin.com/YkKFvm8b

It will be slightly more efficient only if your function iterates through elements in every chunk.

Solution 3 - Python

This will work on any iterable. It returns generator of generators (for full flexibility). I now realize that it's basically the same as @reclosedevs solution, but without the fluff. No need for try...except as the StopIteration propagates up, which is what we want.

The next(iterable) call is needed to raise the StopIteration when the iterable is empty, since islice will continue spawning empty generators forever if you let it.

It's better because it's only two lines long, yet easy to comprehend.

def grouper(iterable, n):
    while True:
        yield itertools.chain((next(iterable),), itertools.islice(iterable, n-1))

Note that next(iterable) is put into a tuple. Otherwise, if next(iterable) itself were iterable, then itertools.chain would flatten it out. Thanks to Jeremy Brown for pointing out this issue.

Solution 4 - Python

I was working on something today and came up with what I think is a simple solution. It is similar to jsbueno's answer, but I believe his would yield empty groups when the length of iterable is divisible by n. My answer does a simple check when the iterable is exhausted.

def chunk(iterable, chunk_size):
    """Generates lists of `chunk_size` elements from `iterable`.
    
    
    >>> list(chunk((2, 3, 5, 7), 3))
    [[2, 3, 5], [7]]
    >>> list(chunk((2, 3, 5, 7), 2))
    [[2, 3], [5, 7]]
    """
    iterable = iter(iterable)
    while True:
        chunk = []
        try:
            for _ in range(chunk_size):
                chunk.append(next(iterable))
            yield chunk
        except StopIteration:
            if chunk:
                yield chunk
            break

Solution 5 - Python

Here's one that returns lazy chunks; use map(list, chunks(...)) if you want lists.

from itertools import islice, chain
from collections import deque

def chunks(items, n):
    items = iter(items)
    for first in items:
        chunk = chain((first,), islice(items, n-1))
        yield chunk
        deque(chunk, 0)

if __name__ == "__main__":
    for chunk in map(list, chunks(range(10), 3)):
        print chunk

    for i, chunk in enumerate(chunks(range(10), 3)):
        if i % 2 == 1:
            print "chunk #%d: %s" % (i, list(chunk))
        else:
            print "skipping #%d" % i

Solution 6 - Python

A succinct implementation is:

chunker = lambda iterable, n: (ifilterfalse(lambda x: x == (), chunk) for chunk in (izip_longest(*[iter(iterable)]*n, fillvalue=())))

This works because [iter(iterable)]*n is a list containing the same iterator n times; zipping over that takes one item from each iterator in the list, which is the same iterator, with the result that each zip-element contains a group of n items.

izip_longest is needed to fully consume the underlying iterable, rather than iteration stopping when the first exhausted iterator is reached, which chops off any remainder from iterable. This results in the need to filter out the fill-value. A slightly more robust implementation would therefore be:

def chunker(iterable, n):
    class Filler(object): pass
    return (ifilterfalse(lambda x: x is Filler, chunk) for chunk in (izip_longest(*[iter(iterable)]*n, fillvalue=Filler)))

This guarantees that the fill value is never an item in the underlying iterable. Using the definition above:

iterable = range(1,11)

map(tuple,chunker(iterable, 3))
[(1, 2, 3), (4, 5, 6), (7, 8, 9), (10,)]

map(tuple,chunker(iterable, 2))
[(1, 2), (3, 4), (5, 6), (7, 8), (9, 10)]

map(tuple,chunker(iterable, 4))
[(1, 2, 3, 4), (5, 6, 7, 8), (9, 10)]

This implementation almost does what you want, but it has issues:

def chunks(it, step):
  start = 0
  while True:
    end = start+step
    yield islice(it, start, end)
    start = end

(The difference is that because islice does not raise StopIteration or anything else on calls that go beyond the end of it this will yield forever; there is also the slightly tricky issue that the islice results must be consumed before this generator is iterated).

To generate the moving window functionally:

izip(count(0, step), count(step, step))

So this becomes:

(it[start:end] for (start,end) in izip(count(0, step), count(step, step)))

But, that still creates an infinite iterator. So, you need takewhile (or perhaps something else might be better) to limit it:

chunk = lambda it, step: takewhile((lambda x: len(x) > 0), (it[start:end] for (start,end) in izip(count(0, step), count(step, step))))

g = chunk(range(1,11), 3)

tuple(g)
([1, 2, 3], [4, 5, 6], [7, 8, 9], [10])

Solution 7 - Python

"Simpler is better than complex" - a straightforward generator a few lines long can do the job. Just place it in some utilities module or so:

def grouper (iterable, n):
    iterable = iter(iterable)
    count = 0
    group = []
    while True:
        try:
            group.append(next(iterable))
            count += 1
            if count % n == 0:
                yield group
                group = []
        except StopIteration:
            yield group
            break

Solution 8 - Python

I forget where I found the inspiration for this. I've modified it a little to work with MSI GUID's in the Windows Registry:

def nslice(s, n, truncate=False, reverse=False):
    """Splits s into n-sized chunks, optionally reversing the chunks."""
    assert n > 0
    while len(s) >= n:
        if reverse: yield s[:n][::-1]
        else: yield s[:n]
        s = s[n:]
    if len(s) and not truncate:
        yield s

reverse doesn't apply to your question, but it's something I use extensively with this function.

>>> [i for i in nslice([1,2,3,4,5,6,7], 3)]
[[1, 2, 3], [4, 5, 6], [7]]
>>> [i for i in nslice([1,2,3,4,5,6,7], 3, truncate=True)]
[[1, 2, 3], [4, 5, 6]]
>>> [i for i in nslice([1,2,3,4,5,6,7], 3, truncate=True, reverse=True)]
[[3, 2, 1], [6, 5, 4]]

Solution 9 - Python

Here you go.

def chunksiter(l, chunks):
    i,j,n = 0,0,0
    rl = []
    while n < len(l)/chunks:        
        rl.append(l[i:j+chunks])        
        i+=chunks
        j+=j+chunks        
        n+=1
    return iter(rl)


def chunksiter2(l, chunks):
    i,j,n = 0,0,0
    while n < len(l)/chunks:        
        yield l[i:j+chunks]
        i+=chunks
        j+=j+chunks        
        n+=1

#Examples:

for l in chunksiter([1,2,3,4,5,6,7,8],3):
	print(l)

[1, 2, 3]
[4, 5, 6]
[7, 8]

for l in chunksiter2([1,2,3,4,5,6,7,8],3):
	print(l)
	
[1, 2, 3]
[4, 5, 6]
[7, 8]


for l in chunksiter2([1,2,3,4,5,6,7,8],5):
	print(l)
	
[1, 2, 3, 4, 5]
[6, 7, 8]

Solution 10 - Python

Since python 3.8, there is a simpler solution using the := operator:

def grouper(it: Iterator, n: int) -> Iterator[list]:
    while chunck := list(itertools.islice(it, n)):
        yield chunck

usage:

>>> list(grouper(iter('ABCDEFG'), 3))
[['A', 'B', 'C'], ['D', 'E', 'F'], ['G']]

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionGerenukView Question on Stackoverflow
Solution 1 - PythonSven MarnachView Answer on Stackoverflow
Solution 2 - PythonreclosedevView Answer on Stackoverflow
Solution 3 - PythonSvein LindalView Answer on Stackoverflow
Solution 4 - PythoneidorbView Answer on Stackoverflow
Solution 5 - PythonPeter OttenView Answer on Stackoverflow
Solution 6 - PythonMarcinView Answer on Stackoverflow
Solution 7 - PythonjsbuenoView Answer on Stackoverflow
Solution 8 - PythonZach YoungView Answer on Stackoverflow
Solution 9 - PythonCarlos QuintanillaView Answer on Stackoverflow
Solution 10 - PythonCédric ROYERView Answer on Stackoverflow