How to do a Postgresql subquery in select clause with join in from clause like SQL Server?

SqlSql ServerPostgresqlSubquery

Sql Problem Overview


I am trying to write the following query on postgresql:

select name, author_id, count(1), 
	(select count(1)
	from names as n2
	where n2.id = n1.id
		and t2.author_id = t1.author_id
	)				
from names as n1
group by name, author_id

This would certainly work on Microsoft SQL Server but it does not at all on postegresql. I read its documentation a bit and it seems I could rewrite it as:

select name, author_id, count(1), total						
from names as n1, (select count(1) as total
	from names as n2
	where n2.id = n1.id
		and n2.author_id = t1.author_id
	) as total
group by name, author_id

But that returns the following error on postegresql: "subquery in FROM cannot refer to other relations of same query level". So I'm stuck. Does anyone know how I can achieve that?

Thanks

Sql Solutions


Solution 1 - Sql

I'm not sure I understand your intent perfectly, but perhaps the following would be close to what you want:

select n1.name, n1.author_id, count_1, total_count
  from (select id, name, author_id, count(1) as count_1
          from names
          group by id, name, author_id) n1
inner join (select id, author_id, count(1) as total_count
              from names
              group by id, author_id) n2
  on (n2.id = n1.id and n2.author_id = n1.author_id)

Unfortunately this adds the requirement of grouping the first subquery by id as well as name and author_id, which I don't think was wanted. I'm not sure how to work around that, though, as you need to have id available to join in the second subquery. Perhaps someone else will come up with a better solution.

Solution 2 - Sql

Complementing @Bob Jarvis and @dmikam answer, Postgres don't perform a good plan when you don't use LATERAL, below a simulation, in both cases the query data results are the same, but the cost are very different

Table structure

CREATE TABLE ITEMS (
	N INTEGER NOT NULL,
	S TEXT NOT NULL
);

INSERT INTO ITEMS
  SELECT
    (random()*1000000)::integer AS n,
    md5(random()::text) AS s
  FROM
    generate_series(1,1000000);

CREATE INDEX N_INDEX ON ITEMS(N);

Performing JOIN with GROUP BY in subquery without LATERAL

EXPLAIN 
SELECT 
	I.*
FROM ITEMS I
INNER JOIN (
	SELECT 
		COUNT(1), n
	FROM ITEMS
	GROUP BY N
) I2 ON I2.N = I.N
WHERE I.N IN (243477, 997947);

The results

Merge Join  (cost=0.87..637500.40 rows=23 width=37)
  Merge Cond: (i.n = items.n)
  ->  Index Scan using n_index on items i  (cost=0.43..101.28 rows=23 width=37)
        Index Cond: (n = ANY ('{243477,997947}'::integer[]))
  ->  GroupAggregate  (cost=0.43..626631.11 rows=861418 width=12)
        Group Key: items.n
        ->  Index Only Scan using n_index on items  (cost=0.43..593016.93 rows=10000000 width=4)

Using LATERAL

EXPLAIN 
SELECT 
	I.*
FROM ITEMS I
INNER JOIN LATERAL (
	SELECT 
		COUNT(1), n
	FROM ITEMS
	WHERE N = I.N
	GROUP BY N
) I2 ON 1=1 --I2.N = I.N
WHERE I.N IN (243477, 997947);

Results

Nested Loop  (cost=9.49..1319.97 rows=276 width=37)
  ->  Bitmap Heap Scan on items i  (cost=9.06..100.20 rows=23 width=37)
        Recheck Cond: (n = ANY ('{243477,997947}'::integer[]))
        ->  Bitmap Index Scan on n_index  (cost=0.00..9.05 rows=23 width=0)
              Index Cond: (n = ANY ('{243477,997947}'::integer[]))
  ->  GroupAggregate  (cost=0.43..52.79 rows=12 width=12)
        Group Key: items.n
        ->  Index Only Scan using n_index on items  (cost=0.43..52.64 rows=12 width=4)
              Index Cond: (n = i.n)

My Postgres version is PostgreSQL 10.3 (Debian 10.3-1.pgdg90+1)

Solution 3 - Sql

I know this is old, but since [Postgresql 9.3][1] there is an option to use a keyword "LATERAL" to use RELATED subqueries inside of JOINS, so the query from the question would look like:

SELECT 
    name, author_id, count(*), t.total
FROM
    names as n1
    INNER JOIN LATERAL (
        SELECT 
            count(*) as total
        FROM 
            names as n2
        WHERE 
            n2.id = n1.id
            AND n2.author_id = n1.author_id
    ) as t ON 1=1
GROUP BY 
    n1.name, n1.author_id

[1]: https://www.postgresql.org/docs/9.3/static/sql-select.html "Postgresql 9.3"

Solution 4 - Sql

I am just answering here with the formatted version of the final sql I needed based on Bob Jarvis answer as posted in my comment above:

select n1.name, n1.author_id, cast(count_1 as numeric)/total_count
  from (select id, name, author_id, count(1) as count_1
          from names
          group by id, name, author_id) n1
inner join (select author_id, count(1) as total_count
              from names
              group by author_id) n2
  on (n2.author_id = n1.author_id)

Solution 5 - Sql

select n1.name, n1.author_id, cast(count_1 as numeric)/total_count
  from (select id, name, author_id, count(1) as count_1
          from names
          group by id, name, author_id) n1
inner join (select distinct(author_id), count(1) as total_count
              from names) n2
  on (n2.author_id = n1.author_id)
Where true

used distinct if more inner join, because more join group performance is slow

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionRicardoView Question on Stackoverflow
Solution 1 - SqlBob Jarvis - Слава УкраїніView Answer on Stackoverflow
Solution 2 - SqldeFreitasView Answer on Stackoverflow
Solution 3 - SqldmikamView Answer on Stackoverflow
Solution 4 - SqlRicardoView Answer on Stackoverflow
Solution 5 - SqlZahid GaniView Answer on Stackoverflow