How to iterate over Pandas Series generated from groupby().size()

PythonPandas

Python Problem Overview


How do you iterate over a Pandas Series generated from a .groupby('...').size() command and get both the group name and count.

As an example if I have:

foo
-1     7
 0    85
 1    14
 2     5

how can I loop over them so that in each iteration I would have -1 & 7, 0 & 85, 1 & 14 and 2 & 5 in variables?

I tried the enumerate option but it doesn't quite work. Example:

for i, row in enumerate(df.groupby(['foo']).size()):
	print(i, row)

it doesn't return -1, 0, 1, and 2 for i but rather 0, 1, 2, 3.

Python Solutions


Solution 1 - Python

Update:

Given a pandas Series:

s = pd.Series([1,2,3,4], index=['a', 'b', 'c', 'd'])

s
#a    1
#b    2
#c    3
#d    4
#dtype: int64

You can directly loop through it, which yield one value from the series in each iteration:

for i in s:
    print(i)
1
2
3
4

If you want to access the index at the same time, you can use either items or iteritems method, which produces a generator that contains both the index and value:

for i, v in s.items():
    print('index: ', i, 'value: ', v)
#index:  a value:  1
#index:  b value:  2
#index:  c value:  3
#index:  d value:  4

for i, v in s.iteritems():
    print('index: ', i, 'value: ', v)
#index:  a value:  1
#index:  b value:  2
#index:  c value:  3
#index:  d value:  4

Old Answer:

You can call iteritems() method on the Series:

for i, row in df.groupby('a').size().iteritems():
    print(i, row)

# 12 4
# 14 2

According to doc:

> Series.iteritems() > > Lazily iterate over (index, value) tuples

Note: This is not the same data as in the question, just a demo.

Solution 2 - Python

To expand upon the answer of Psidom, there are three useful ways to unpack data from pd.Series. Having the same Series as Psidom:

s = pd.Series([1,2,3,4], index=['a', 'b', 'c', 'd'])

  • A direct loop over s yields the value of each row.
  • A loop over s.iteritems() or s.items() yields a tuple with the (index,value) pairs of each row.
  • Using enumerate() on s.iteritems() yields a nested tuple in the form of: (rownum,(index,value)).

The last way is useful in case your index contains other information than the row number itself (e.g. in a case of a timeseries where the index is time).

s = pd.Series([1,2,3,4], index=['a', 'b', 'c', 'd'])

for rownum,(indx,val) in enumerate(s.iteritems()):
    print('row number: ', rownum, 'index: ', indx, 'value: ', val)

will output:

row number:  0 index:  a value:  1
row number:  1 index:  b value:  2
row number:  2 index:  c value:  3
row number:  3 index:  d value:  4

You can read more on unpacking nested tuples here.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionReily BourneView Question on Stackoverflow
Solution 1 - PythonPsidomView Answer on Stackoverflow
Solution 2 - PythondbouzView Answer on Stackoverflow