Python: How to get the length of itertools _grouper

PythonGroup ByItertools

Python Problem Overview


I'm working with Python itertools and using groupby to sort a bunch of pairs by the last element. I've gotten it to sort and I can iterate through the groups just fine, but I would really love to be able to get the length of each group without having to iterate through each one, incrementing a counter.

The project is cluster some data points. I'm working with pairs of (numpy.array, int) where the numpy array is a data point and the integer is a cluster label

Here's my relevant code:

data = sorted(data, key=lambda (point, cluster):cluster)
for cluster,clusterList in itertools.groupby(data, key=lambda (point, cluster):cluster):
    if len(clusterList) < minLen:

On the last line: if len(clusterList) < minLen:, I get an error that >object of type 'itertools._grouper' has no len()

I've looked up the operations available for _groupers, but can't find anything that seems to provide the length of a group.

Python Solutions


Solution 1 - Python

Just because you call it clusterList doesn't make it a list! It's basically a lazy iterator, returning each item as it's needed. You can convert it to a list like this, though:

clusterList = list(clusterList)

Or do that and get its length in one step:

length = len(list(clusterList))

If you don't want to take up the memory of making it a list, you can do this instead:

length = sum(1 for x in clusterList)

Be aware that the original iterator will be consumed entirely by either converting it to a list or using the sum() formulation.

Solution 2 - Python

clusterList is iterable but it is not a list. This can be a little confusing sometimes. You can do a for loop over clusterList but you can't do other list things over it (slice, len, etc).

Fix: assign the result of list(clusterList) to clusterList.

Solution 3 - Python

You can use cardinality package for that. Method count() counts the number of items that iterable yields.

> cardinality: determine and check the size of any iterable

The following code gives you the length of clusterList

import cardinality
cardinality.count(clusterList)

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
Questionuser1466679View Question on Stackoverflow
Solution 1 - PythonkindallView Answer on Stackoverflow
Solution 2 - PythonBrian CainView Answer on Stackoverflow
Solution 3 - PythonMaryam BahramiView Answer on Stackoverflow