Length of generator output

PythonGeneratorIterable

Python Problem Overview


Python provides a nice method for getting length of an eager iterable, len(x) that is. But I couldn't find anything similar for lazy iterables represented by generator comprehensions and functions. Of course, it is not hard to write something like:

def iterlen(x):
  n = 0
  try:
    while True:
      next(x)
      n += 1
  except StopIteration: pass
  return n

But I can't get rid of a feeling that I'm reimplementing a bicycle.

(While I was typing the function, a thought struck my mind: maybe there really is no such function, because it "destroys" its argument. Not an issue for my case, though).

P.S.: concerning the first answers - yes, something like len(list(x)) would work too, but that drastically increases the usage of memory.

P.P.S.: re-checked... Disregard the P.S., seems I made a mistake while trying that, it works fine. Sorry for the trouble.

Python Solutions


Solution 1 - Python

The easiest way is probably just sum(1 for _ in gen) where gen is your generator.

Solution 2 - Python

For those who would like to know the summary of that discussion. The final top scores for counting a 50 million-lengthed generator expression using:

  • len(list(gen)),
  • len([_ for _ in gen]),
  • sum(1 for _ in gen),
  • ilen(gen) (from more_itertools),
  • reduce(lambda c, i: c + 1, gen, 0),

sorted by performance of execution (including memory consumption), will make you surprised:

#1: test_list.py:8: 0.492 KiB
    gen = (i for i in data*1000); t0 = monotonic(); len(list(gen))
('list, sec', 1.9684218849870376)

#2: test_list_compr.py:8: 0.867 KiB
    gen = (i for i in data*1000); t0 = monotonic(); len([i for i in gen])
('list_compr, sec', 2.5885991149989422)

#3: test_sum.py:8: 0.859 KiB
    gen = (i for i in data*1000); t0 = monotonic(); sum(1 for i in gen); t1 = monotonic()
('sum, sec', 3.441088170016883)

#4: more_itertools/more.py:413: 1.266 KiB
    d = deque(enumerate(iterable, 1), maxlen=1)
   
    test_ilen.py:10: 0.875 KiB
    gen = (i for i in data*1000); t0 = monotonic(); ilen(gen)
('ilen, sec', 9.812256851990242)

#5: test_reduce.py:8: 0.859 KiB
    gen = (i for i in data*1000); t0 = monotonic(); reduce(lambda counter, i: counter + 1, gen, 0)
('reduce, sec', 13.436614598002052)

So, len(list(gen)) is the most frequent and less memory consumable

Solution 3 - Python

There isn't one because you can't do it in the general case - what if you have a lazy infinite generator? For example:

def fib():
    a, b = 0, 1
    while True:
        a, b = b, a + b
        yield a

This never terminates but will generate the Fibonacci numbers. You can get as many Fibonacci numbers as you want by calling next().

If you really need to know the number of items there are, then you can't iterate through them linearly one time anyway, so just use a different data structure such as a regular list.

Solution 4 - Python

def count(iter):
    return sum(1 for _ in iter)

Or better yet:

def count(iter):
    try:
        return len(iter)
    except TypeError:
        return sum(1 for _ in iter)

If it's not iterable, it will throw a TypeError.

Or, if you want to count something specific in the generator:

def count(iter, key=None):
    if key:
        if callable(key):
            return sum(bool(key(x)) for x in iter)
        return sum(x == key for x in iter)
    try:
        return len(iter)
    except TypeError:
        return sum(1 for _ in iter)

Solution 5 - Python

You can use enumerate() to loop through the generated data stream, then return the last number -- the number of items.

I tried to use itertools.count() with itertools.izip() but no luck. This is the best/shortest answer I've come up with:

#!/usr/bin/python

import itertools

def func():
    for i in 'yummy beer':
        yield i

def icount(ifunc):
    size = -1 # for the case of an empty iterator
    for size, _ in enumerate(ifunc()):
        pass
    return size + 1

print list(func())
print 'icount', icount(func)

# ['y', 'u', 'm', 'm', 'y', ' ', 'b', 'e', 'e', 'r']
# icount 10

Kamil Kisiel's solution is way better:

def count_iterable(i):
    return sum(1 for e in i)

Solution 6 - Python

By definition, only a subset of generators will return after a certain number of arguments (have a pre-defined length), and even then, only a subset of these finite generators have a predictable end (accessing the generator can have side-effects which could stop the generator earlier).

If you wish to implement length methods for your generator, you have to first define what you consider the "length" (is it the total number of elements? the number of remaining elements?), then wrap your generator in a class. Here's an example:

class MyFib(object):
    """
    A class iterator that iterates through values of the
    Fibonacci sequence, until, optionally, a maximum length is reached.
    """

    def __init__(self, length):
        self._length = length
        self._i = 0

     def __iter__(self):
        a, b = 0, 1
        while not self._length or self._i < self._length:
            a, b = b, a + b
            self._i += 1
            yield a

    def __len__(self):
        "This method returns the total number of elements"
        if self._length:
            return self._length
        else:
            raise NotImplementedError("Infinite sequence has no length")
            # or simply return None / 0 depending
            # on implementation

Here is how to use it:

In [151]: mf = MyFib(20)

In [152]: len(mf)
Out[152]: 20

In [153]: l = [n for n in mf]

In [154]: len(l)
Out[154]: 20

In [155]: l
Out[155]: 
[1,
 1,
 2,
...
6765]


In [156]: mf0 = MyFib(0)

In [157]: len(mf0)
---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
<ipython-input-157-2e89b32ad3e4> in <module>()
----> 1 len(mf0)

/tmp/ipython_edit_TWcV1I.py in __len__(self)
     22             return self._length
     23         else:
---> 24             raise NotImplementedError
     25             # or simply return None / 0 depending
     26             # on implementation

NotImplementedError: 

In [158]: g = iter(mf0)

In [159]: l0 = [g.next(), g.next(), g.next()]

In [160]: l0
Out[160]: [1, 1, 2]

Solution 7 - Python

Use reduce(function, iterable[, initializer]) for a memory efficient purely functional solution:

>>> iter = "This string has 30 characters."
>>> reduce(lambda acc, e: acc + 1, iter, 0)
30

Solution 8 - Python

Try the more_itertools package for a simple solution. Example:

>>> import more_itertools

>>> it = iter("abcde")                                         # sample generator
>>> it
<str_iterator at 0x4ab3630>

>>> more_itertools.ilen(it)
5

See this post for another applied example.

Solution 9 - Python

This is a hack, but if you really want to have len work on a general iterable (consuming it in the way), you can create your own version of len.

The len function is essentially equivalent to the following (though implementations usually provide some optimizations to avoid the extra lookup):

def len(iterable):
    return iterable.__len__()

Therefore we can define our new_len to try that, and if __len__ does not exist, count the number of elements ourselves by consuming the iterable:

def new_len(iterable):
    try:
      return iterable.__len__()
    except AttributeError:
      return sum(1 for _ in iterable)

The above works in Python 2/3, and (as far as I know) should cover every conceivable kind of iterable.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionMaximView Question on Stackoverflow
Solution 1 - PythonMatt DunhamView Answer on Stackoverflow
Solution 2 - PythonAlex-BogdanovView Answer on Stackoverflow
Solution 3 - PythonAdam RosenfieldView Answer on Stackoverflow
Solution 4 - PythonmpenView Answer on Stackoverflow
Solution 5 - PythonView Answer on Stackoverflow
Solution 6 - PythonsleblancView Answer on Stackoverflow
Solution 7 - PythonOlivierBlanvillainView Answer on Stackoverflow
Solution 8 - PythonpylangView Answer on Stackoverflow
Solution 9 - PythonyoniLaviView Answer on Stackoverflow