Python Pandas: Group datetime column into hour and minute aggregations

PythonDatePandas

Python Problem Overview


This seems like it would be fairly straight forward but after nearly an entire day I have not found the solution. I've loaded my dataframe with read_csv and easily parsed, combined and indexed a date and a time column into one column but now I want to be able to just reshape and perform calculations based on hour and minute groupings similar to what you can do in excel pivot.

I know how to resample to hour or minute but it maintains the date portion associated with each hour/minute whereas I want to aggregate the data set ONLY to hour and minute similar to grouping in excel pivots and selecting "hour" and "minute" but not selecting anything else.

Any help would be greatly appreciated.

Python Solutions


Solution 1 - Python

Can't you do, where df is your DataFrame:

times = pd.to_datetime(df.timestamp_col)
df.groupby([times.dt.hour, times.dt.minute]).value_col.sum()

Solution 2 - Python

Wes' code didn't work for me. But the DatetimeIndex function (docs) did:

times = pd.DatetimeIndex(data.datetime_col)
grouped = df.groupby([times.hour, times.minute])

The DatetimeIndex object is a representation of times in pandas. The first line creates a array of the datetimes. The second line uses this array to get the hour and minute data for all of the rows, allowing the data to be grouped (docs) by these values.

Solution 3 - Python

Came across this when I was searching for this type of groupby. Wes' code above didn't work for me, not sure if it's because changes in pandas over time.

In pandas 0.16.2, what I did in the end was:

grp = data.groupby(by=[data.datetime_col.map(lambda x : (x.hour, x.minute))])
grp.count()

You'd have (hour, minute) tuples as the grouped index. If you want multi-index:

grp = data.groupby(by=[data.datetime_col.map(lambda x : x.hour),
                       data.datetime_col.map(lambda x : x.minute)])

Solution 4 - Python

I have an alternative of Wes & Nix answers above, with just one line of code, assuming your column is already a datetime column, you don't need to get the hour and minute attributes separately:

df.groupby(df.timestamp_col.dt.time).value_col.sum()

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
Questionhoratio1701dView Question on Stackoverflow
Solution 1 - PythonWes McKinneyView Answer on Stackoverflow
Solution 2 - PythonNix G-DView Answer on Stackoverflow
Solution 3 - PythonWillZView Answer on Stackoverflow
Solution 4 - PythontsandoView Answer on Stackoverflow