How do I build a numpy array from a generator?

PythonNumpyGenerator

Python Problem Overview


How can I build a numpy array out of a generator object?

Let me illustrate the problem:

>>> import numpy
>>> def gimme():
...   for x in xrange(10):
...     yield x
...
>>> gimme()
<generator object at 0x28a1758>
>>> list(gimme())
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> numpy.array(xrange(10))
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> numpy.array(gimme())
array(<generator object at 0x28a1758>, dtype=object)
>>> numpy.array(list(gimme()))
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In this instance, gimme() is the generator whose output I'd like to turn into an array. However, the array constructor does not iterate over the generator, it simply stores the generator itself. The behaviour I desire is that from numpy.array(list(gimme())), but I don't want to pay the memory overhead of having the intermediate list and the final array in memory at the same time. Is there a more space-efficient way?

Python Solutions


Solution 1 - Python

One google behind this stackoverflow result, I found that there is a numpy.fromiter(data, dtype, count). The default count=-1 takes all elements from the iterable. It requires a dtype to be set explicitly. In my case, this worked:

numpy.fromiter(something.generate(from_this_input), float)

Solution 2 - Python

Numpy arrays require their length to be set explicitly at creation time, unlike python lists. This is necessary so that space for each item can be consecutively allocated in memory. Consecutive allocation is the key feature of numpy arrays: this combined with native code implementation let operations on them execute much quicker than regular lists.

Keeping this in mind, it is technically impossible to take a generator object and turn it into an array unless you either:

  1. can predict how many elements it will yield when run:

     my_array = numpy.empty(predict_length())
     for i, el in enumerate(gimme()): my_array[i] = el
    
  2. are willing to store its elements in an intermediate list :

     my_array = numpy.array(list(gimme()))
    
  3. can make two identical generators, run through the first one to find the total length, initialize the array, and then run through the generator again to find each element:

     length = sum(1 for el in gimme())
     my_array = numpy.empty(length)
     for i, el in enumerate(gimme()): my_array[i] = el
    

1 is probably what you're looking for. 2 is space inefficient, and 3 is time inefficient (you have to go through the generator twice).

Solution 3 - Python

While you can create a 1D array from a generator with numpy.fromiter(), you can create an N-D array from a generator with numpy.stack:

>>> mygen = (np.ones((5, 3)) for _ in range(10))
>>> x = numpy.stack(mygen)
>>> x.shape
(10, 5, 3)

It also works for 1D arrays:

>>> numpy.stack(2*i for i in range(10))
array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])

Note that numpy.stack is internally consuming the generator and creating an intermediate list with arrays = [asanyarray(arr) for arr in arrays]. The implementation can be found here.

[WARNING] As pointed out by @Joseh Seedy, Numpy 1.16 raises a warning that defeats usage of such function with generators.

Solution 4 - Python

Somewhat tangential, but if your generator is a list comprehension, you can use numpy.where to more effectively get your result (I discovered this in my own code after seeing this post)

Solution 5 - Python

The vstack, hstack, and dstack functions can take as input generators that yield multi-dimensional arrays.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionsaffsdView Question on Stackoverflow
Solution 1 - PythondhillView Answer on Stackoverflow
Solution 2 - PythonshsmurfyView Answer on Stackoverflow
Solution 3 - PythonmdeffView Answer on Stackoverflow
Solution 4 - PythonBenjamin HorstmanView Answer on Stackoverflow
Solution 5 - PythonMike RView Answer on Stackoverflow