group by week in pandas

PythonPandasDateGroup By

Python Problem Overview


I'm having this data frame:

Name   Date    Quantity
Apple  07/11/17  20
orange 07/14/17  20
Apple  07/14/17  70
Orange 07/25/17  40
Apple  07/20/17  30

I want to aggregate this by Name and Date to get sum of quantities Details:

Date: Group, the result should be at the beginning of the week (or just on Monday)

Quantity: Sum, if two or more records have same Name and Date (if falls on same interval)

The desired output is given below:

Name   Date    Quantity
Apple  07/10/17  90
orange 07/10/17  20
Apple  07/17/17  30
orange 07/24/17  40

Thanks in advance

Python Solutions


Solution 1 - Python

First convert column date to_datetime and substract one week, as we want to sum for the week ahead of the date, not the week before that date.

Then use groupby with Grouper by W-MON and aggregate sum:

df['Date'] = pd.to_datetime(df['Date']) - pd.to_timedelta(7, unit='d')
df = df.groupby(['Name', pd.Grouper(key='Date', freq='W-MON')])['Quantity']
       .sum()
       .reset_index()
       .sort_values('Date')
print (df)
     Name       Date  Quantity
0   Apple 2017-07-10        90
3  orange 2017-07-10        20
1   Apple 2017-07-17        30
2  Orange 2017-07-24        40

Solution 2 - Python

Let's use groupby, resample with W-Mon, and sum:

df.groupby('Name').resample('W-Mon', on='Date').sum().reset_index().sort_values(by='Date')

Output:

     Name       Date  Quantity
0   Apple 2017-07-17        90
3  orange 2017-07-17        20
1   Apple 2017-07-24        30
2  Orange 2017-07-31        40

Solution 3 - Python

First convert column date to_datetime. This will group by week starting with Mondays. It will output the week number (but you can change that looking up in

> http://strftime.org/

df.groupby(['name', df['date'].dt.strftime('%W')])['quantity'].sum()

Output:

name    date
apple   28      90
        29      30
orange  28      20
        30      40

Solution 4 - Python

This groups every row on the previous Monday (if the date is already Monday, nothing is changed). This has the effect of grouping by week:

import pandas as pd, datetime as dt

# df = ...

df['WeekDate'] = df.apply(lambda row: row['Date'] - dt.timedelta(days=row['Date'].weekday()), axis=1)

perweek = df['WeekDate'].groupby(df['WeekDate']).count()

Exemple:

Date           WeekDate
2020-06-20     2020-06-15 <- monday
2020-06-21     2020-06-15
2020-06-24     2020-06-22 <- monday 
2020-06-25     2020-06-22
2020-06-26     2020-06-22

Solution 5 - Python

You already received a lot of good answer and the question is quite old, but, given the fact some of the solutions use deprecated functions and I encounted the same problem and found a different solution I think could be helpful to someone to share it.

Given the dataframe you proposed:

Name   Date    Quantity
Apple  07/11/17  20
orange 07/14/17  20
Apple  07/14/17  70
Orange 07/25/17  40
Apple  07/20/17  30

We have to convert the values in 'Date' as Pandas' Datetime since they are strings right now.
Then we can use the Series' dt property that allow us to handle DateTime-like series and extract informations.

df['Date'] = pd.to_datetime(df['Date'], format='%m/%d/%y')

By having a DateTime format allow us to use the dt parameters to extract the number of the week associated to the date. In order to do not loose any information I prefer to add a new column with the week number. Once retrieved the number of the week we can group by that week.

df['WeekNumber'] = df['Date'].dt.isocalendar().week
df.groupby(['Name', 'WeekNumber']).sum()

Name	WeekNumber	
Apple	28	90
        29	30
Orange	28	20
        30	40
Small problem: what if we consider different years?

There could be the case in whick our data have a range of years, in that situation we cannot consider only the week (otherwise we would mix up data from one year into another), so it would be useful to extract also the year column from isocalendar().

df['year'] = df['Date'].dt.isocalendar().year
df.groupby(['Name', 'WeekNumber', 'year']).sum()

Name	WeekNumber	year	Quantity
Apple	28	        2017	90
        29	        2017	30
Orange	28	        2017	20
        30	        2017	40

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionRamView Question on Stackoverflow
Solution 1 - PythonjezraelView Answer on Stackoverflow
Solution 2 - PythonScott BostonView Answer on Stackoverflow
Solution 3 - PythonLN_PView Answer on Stackoverflow
Solution 4 - PythonBasjView Answer on Stackoverflow
Solution 5 - PythonAndrePView Answer on Stackoverflow