PostgreSQL query to count/group by day and display days with no data
SqlPostgresqlJoinGroup BySql Problem Overview
I need to create a PostgreSQL query that returns
- a day
- the number of objects found for that day
It's important that every single day appear in the results, even if no objects were found on that day. (This has been discussed before but I haven't been able to get things working in my specific case.)
First, I found a sql query to generate a range of days, with which I can join:
SELECT to_char(date_trunc('day', (current_date - offs)), 'YYYY-MM-DD')
AS date
FROM generate_series(0, 365, 1)
AS offs
Results in:
date
------------
2013-03-28
2013-03-27
2013-03-26
2013-03-25
...
2012-03-28
(366 rows)
Now I'm trying to join that to a table named 'sharer_emailshare' which has a 'created' column:
Table 'public.sharer_emailshare'
column | type
-------------------
id | integer
created | timestamp with time zone
message | text
to | character varying(75)
Here's the best GROUP BY
query I have so far:
SELECT d.date, count(se.id) FROM (
select to_char(date_trunc('day', (current_date - offs)), 'YYYY-MM-DD')
AS date
FROM generate_series(0, 365, 1)
AS offs
) d
JOIN sharer_emailshare se
ON (d.date=to_char(date_trunc('day', se.created), 'YYYY-MM-DD'))
GROUP BY d.date;
The results:
date | count
------------+-------
2013-03-27 | 11
2013-03-24 | 2
2013-02-14 | 2
(3 rows)
Desired results:
date | count
------------+-------
2013-03-28 | 0
2013-03-27 | 11
2013-03-26 | 0
2013-03-25 | 0
2013-03-24 | 2
2013-03-23 | 0
...
2012-03-28 | 0
(366 rows)
If I understand correctly this is because I'm using a plain (implied INNER
) JOIN
, and this is the expected behavior, as discussed in the postgres docs.
I've looked through dozens of StackOverflow solutions, and all the ones with working queries seem specific to MySQL/Oracle/MSSQL and I'm having a hard time translating them to PostgreSQL.
The guy asking this question found his answer, with Postgres, but put it on a pastebin link that expired some time ago.
I've tried to switch to LEFT OUTER JOIN
, RIGHT JOIN
, RIGHT OUTER JOIN
, CROSS JOIN
, use a CASE
statement to sub in another value if null, COALESCE
to provide a default value, etc, but I haven't been able to use them in a way that gets me what I need.
Any assistance is appreciated! And I promise I'll get around to reading that giant PostgreSQL book soon ;)
Sql Solutions
Solution 1 - Sql
You just need a left outer join
instead of an inner join:
SELECT d.date, count(se.id)
FROM
(
SELECT to_char(date_trunc('day', (current_date - offs)), 'YYYY-MM-DD') AS date
FROM generate_series(0, 365, 1) AS offs
) d
LEFT OUTER JOIN sharer_emailshare se
ON d.date = to_char(date_trunc('day', se.created), 'YYYY-MM-DD')
GROUP BY d.date;
Solution 2 - Sql
Extending Gordon Linoff's helpful answer, I would suggest a couple of improvements such as:
- Use
::date
instead ofdate_trunc('day', ...)
- Join on a date type rather than a character type (it's cleaner).
- Use specific date ranges so they're easier to change later. In this case I select a year before the most recent entry in the table - something that couldn't have been done easily with the other query.
- Compute the totals for an arbitrary subquery (using a CTE). You just have to cast the column of interest to the date type and call it date_column.
- Include a column for cumulative total. (Why not?)
Here's my query:
WITH dates_table AS (
SELECT created::date AS date_column FROM sharer_emailshare WHERE showroom_id=5
)
SELECT series_table.date, COUNT(dates_table.date_column), SUM(COUNT(dates_table.date_column)) OVER (ORDER BY series_table.date) FROM (
SELECT (last_date - b.offs) AS date
FROM (
SELECT GENERATE_SERIES(0, last_date - first_date, 1) AS offs, last_date from (
SELECT MAX(date_column) AS last_date, (MAX(date_column) - '1 year'::interval)::date AS first_date FROM dates_table
) AS a
) AS b
) AS series_table
LEFT OUTER JOIN dates_table
ON (series_table.date = dates_table.date_column)
GROUP BY series_table.date
ORDER BY series_table.date
I tested the query, and it produces the same results, plus the column for cumulative total.
Solution 3 - Sql
I'll try to provide an answer that includes some explanation. I'll start with the smallest building block and work up.
If you run a query like this:
SELECT series.number FROM generate_series(0, 9) AS series(number)
You get output like this:
number
--------
0
1
2
3
4
5
6
7
8
9
(10 rows)
This can be turned into dates like this:
SELECT CURRENT_DATE + sequential_dates.date AS date
FROM generate_series(0, 9) AS sequential_dates(date)
Which will give output like this:
date
------------
2019-09-29
2019-09-30
2019-10-01
2019-10-02
2019-10-03
2019-10-04
2019-10-05
2019-10-06
2019-10-07
2019-10-08
(10 rows)
Then you can do a query like this (for example), joining the original query as a subquery against whatever table you're ultimately interested in:
SELECT sequential_dates.date,
COUNT(calendar_items.*) AS calendar_item_count
FROM (SELECT CURRENT_DATE + sequential_dates.date AS date
FROM generate_series(0, 9) AS sequential_dates(date)) sequential_dates
LEFT JOIN calendar_items ON calendar_items.starts_at::date = sequential_dates.date
GROUP BY sequential_dates.date
Which will give output like this:
date | calendar_item_count
------------+---------------------
2019-09-29 | 1
2019-09-30 | 8
2019-10-01 | 15
2019-10-02 | 11
2019-10-03 | 1
2019-10-04 | 12
2019-10-05 | 0
2019-10-06 | 0
2019-10-07 | 27
2019-10-08 | 24
Solution 4 - Sql
Based on Gordon Linoff's answer I realized another problem was that I had a WHERE
clause that I didn't mention in the original question.
Instead of a naked WHERE
, I made a subquery:
SELECT d.date, count(se.id) FROM (
select to_char(date_trunc('day', (current_date - offs)), 'YYYY-MM-DD')
AS date
FROM generate_series(0, 365, 1)
AS offs
) d
LEFT OUTER JOIN (
SELECT * FROM sharer_emailshare
WHERE showroom_id=5
) se
ON (d.date=to_char(date_trunc('day', se.created), 'YYYY-MM-DD'))
GROUP BY d.date;
Solution 5 - Sql
I like Jason Swett SQL however ran into issue where the count on some dates should be a zero rather than a one. Running the statment select count(*) from public.post_call_info where timestamp::date = '2020-11-23' count = zero, but below equals a one.
Also the + give me a forward schedule so changed to a minus provide 9 days data prior to current date.
SELECT sequential_dates.date,
COUNT(*) AS call_count
FROM (SELECT CURRENT_DATE - sequential_dates.date AS date
FROM generate_series(0, 9) AS sequential_dates(date)) sequential_dates
LEFT JOIN public.post_call_info ON public.post_call_info.timestamp::date =
sequential_dates.date
GROUP BY sequential_dates.date
order by date desc