How to GroupBy a Dataframe in Pandas and keep Columns
PythonPandasPython Problem Overview
given a dataframe that logs uses of some books like this:
Name Type ID
Book1 ebook 1
Book2 paper 2
Book3 paper 3
Book1 ebook 1
Book2 paper 2
I need to get the count of all the books, keeping the other columns and get this:
Name Type ID Count
Book1 ebook 1 2
Book2 paper 2 2
Book3 paper 3 1
How can this be done?
Thanks!
Python Solutions
Solution 1 - Python
You want the following:
In [20]:
df.groupby(['Name','Type','ID']).count().reset_index()
Out[20]:
Name Type ID Count
0 Book1 ebook 1 2
1 Book2 paper 2 2
2 Book3 paper 3 1
In your case the 'Name', 'Type' and 'ID' cols match in values so we can groupby
on these, call count
and then reset_index
.
An alternative approach would be to add the 'Count' column using transform
and then call drop_duplicates
:
In [25]:
df['Count'] = df.groupby(['Name'])['ID'].transform('count')
df.drop_duplicates()
Out[25]:
Name Type ID Count
0 Book1 ebook 1 2
1 Book2 paper 2 2
2 Book3 paper 3 1
Solution 2 - Python
I think as_index=False should do the trick.
df.groupby(['Name','Type','ID'], as_index=False).count()
Solution 3 - Python
If you have many columns in a df it makes sense to use df.groupby(['foo']).agg(...)
, see here. The .agg()
function allows you to choose what to do with the columns you don't want to apply operations on. If you just want to keep them, use .agg({'col1': 'first', 'col2': 'first', ...}
. Instead of 'first'
, you can also apply 'sum'
, 'mean'
and others.