What is the fastest way to truncate timestamps to 5 minutes in Postgres?

SqlPerformancePostgresqlDatetime

Sql Problem Overview


Postgres can round (truncate) timestamps using the date_trunc function, like this:

date_trunc('hour', val)
date_trunc('minute', val)

I'm looking for a way to truncate a timestamp to the nearest 5-minute boundary so, for example, 14:26:57 becomes 14:25:00. The straightforward way to do it is like this:

date_trunc('hour', val) + date_part('minute', val)::int / 5 * interval '5 min'

Since this is a performance-critical part of the query, I'm wondering whether this is the fastest solution, or whether there's some shortcut (compatible with Postgres 8.1+) that I've overlooked.

Sql Solutions


Solution 1 - Sql

I was wondering the same thing. I found two alternative ways of doing this, but the one you suggested was faster.

I informally benchmarked against one of our larger tables. I limited the query to the first 4 million rows. I alternated between the two queries in order to avoid giving one a unfair advantage due to db caching.


Going through epoch/unix time

SELECT to_timestamp(
    floor(EXTRACT(epoch FROM ht.time) / EXTRACT(epoch FROM interval '5 min'))
    * EXTRACT(epoch FROM interval '5 min')
) FROM huge_table AS ht LIMIT 4000000

(Note this produces timestamptzeven if you used a time zone unaware datatype)

Results

  • Run 1: 39.368 seconds
  • Run 3: 39.526 seconds
  • Run 5: 39.883 seconds

Using date_trunc and date_part

SELECT 
    date_trunc('hour', ht.time) 
    + date_part('minute', ht.time)::int / 5 * interval '5 min'
FROM huge_table AS ht LIMIT 4000000

Results

  • Run 2: 34.189 seconds
  • Run 4: 37.028 seconds
  • Run 6: 32.397 seconds

System

  • DB version: PostgreSQL 9.6.2 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 4.8.2-19ubuntu1) 4.8.2, 64-bit
  • Cores: Intel® Xeon®, E5-1650v2, Hexa-Core
  • RAM: 64 GB, DDR3 ECC RAM

Conclusion

Your version seems to be faster. But not fast enough for my specific use case. The advantage of not having to specify the hour makes the epoch version more versatile and produces simpler parameterization in client side code. It handles 2 hour intervals just as well as 5 minute intervals without having to bump the date_trunc time unit argument up. On a end note, I wish this time unit argument was changed to a time interval argument instead.

Solution 2 - Sql

I don't think there is any quicker method.

And I don't think you should be worried about the performance of the expression.

Everything else that is involved in executing your (SELECT, UPDATE, ...) statement is most probably a lot more expensive (e.g. the I/O to retrieve rows) than that date/time calculation.

Solution 3 - Sql

Full query for those wondering (based on @DNS question):

Assuming you have orders and you want to count them by slices of 5min and shop_id:

SELECT date_trunc('hour', created_at) + date_part('minute', created_at)::int / 5 * interval '5 min' AS minute
      , shop_id, count(id) as orders_count
FROM orders
GROUP BY 1, shop_id
ORDER BY 1 ASC

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionDNSView Question on Stackoverflow
Solution 1 - SqlAndré C. AndersenView Answer on Stackoverflow
Solution 2 - Sqla_horse_with_no_nameView Answer on Stackoverflow
Solution 3 - SqlBenjamin CrouzierView Answer on Stackoverflow