group by pandas dataframe and select latest in each group
PythonPandasGroup ByPandas GroupbyPython Problem Overview
How to group values of pandas dataframe and select the latest(by date) from each group?
For example, given a dataframe sorted by date:
id product date
0 220 6647 2014-09-01
1 220 6647 2014-09-03
2 220 6647 2014-10-16
3 826 3380 2014-11-11
4 826 3380 2014-12-09
5 826 3380 2015-05-19
6 901 4555 2014-09-01
7 901 4555 2014-10-05
8 901 4555 2014-11-01
grouping by id or product, and selecting the earliest gives:
id product date
2 220 6647 2014-10-16
5 826 3380 2015-05-19
8 901 4555 2014-11-01
Python Solutions
Solution 1 - Python
You can also use tail
with groupby to get the last n values of the group:
df.sort_values('date').groupby('id').tail(1)
id product date
2 220 6647 2014-10-16
8 901 4555 2014-11-01
5 826 3380 2015-05-19
Solution 2 - Python
use idxmax
in groupby
and slice df
with loc
df.loc[df.groupby('id').date.idxmax()]
id product date
2 220 6647 2014-10-16
5 826 3380 2015-05-19
8 901 4555 2014-11-01
Solution 3 - Python
I had a similar problem and ended up using drop_duplicates
rather than groupby
.
It seems to run significatively faster on large datasets when compared with other methods suggested above.
df.sort_values(by="date").drop_duplicates(subset=["id"], keep="last")
id product date
2 220 6647 2014-10-16
8 901 4555 2014-11-01
5 826 3380 2015-05-19
Solution 4 - Python
Given a dataframe sorted by date, you can obtain what you ask for in a number of ways:
Like this:
df.groupby(['id','product']).last()
like this:
df.groupby(['id','product']).nth(-1)
or like this:
df.groupby(['id','product']).max()
If you don't want id
and product
to appear as index use groupby(['id', 'product'], as_index=False)
.
Alternatively use:
df.groupby(['id','product']).tail(1)
Solution 5 - Python
To use .tail()
as an aggregation method and keep your grouping intact:
df.sort_values('date').groupby('id').apply(lambda x: x.tail(1))
id product date
id
220 2 220 6647 2014-10-16
826 5 826 3380 2015-05-19
901 8 901 4555 2014-11-01