How to join only one row in joined table with postgres?

SqlPostgresqlJoin

Sql Problem Overview


I have the following schema:

CREATE TABLE author (
    id   integer
  , name varchar(255)
);
CREATE TABLE book (
    id        integer
  , author_id integer
  , title     varchar(255)
  , rating    integer
);

And I want each author with its last book:

SELECT book.id, author.id, author.name, book.title as last_book
FROM author
JOIN book book ON book.author_id = author.id

GROUP BY author.id
ORDER BY book.id ASC

Apparently you can do that in mysql: https://stackoverflow.com/questions/6468314/join-two-tables-returning-just-one-row-from-the-second-table-mysql.

But postgres gives this error:

> ERROR: column "book.id" must appear in the GROUP BY clause or be used > in an aggregate function: SELECT book.id, author.id, author.name, > book.title as last_book FROM author JOIN book book ON book.author_id = > author.id GROUP BY author.id ORDER BY book.id ASC

It's because:

> When GROUP BY is present, it is not valid for the SELECT list > expressions to refer to ungrouped columns except within aggregate > functions, since there would be more than one possible value to return > for an ungrouped column.

How can I specify to postgres: "Give me only the last row, when ordered by joined_table.id, in the joined table ?"


Edit: With this data:

INSERT INTO author (id, name) VALUES
  (1, 'Bob')
, (2, 'David')
, (3, 'John');

INSERT INTO book (id, author_id, title, rating) VALUES
  (1, 1, '1st book from bob', 5)
, (2, 1, '2nd book from bob', 6)
, (3, 1, '3rd book from bob', 7)
, (4, 2, '1st book from David', 6)
, (5, 2, '2nd book from David', 6);

I should see:

book_id author_id name    last_book
3       1         "Bob"   "3rd book from bob"
5       2         "David" "2nd book from David"

Sql Solutions


Solution 1 - Sql

select distinct on (author.id)
    book.id, author.id, author.name, book.title as last_book
from
    author
    inner join
    book on book.author_id = author.id
order by author.id, book.id desc

Check distinct on

> SELECT DISTINCT ON ( expression [, ...] ) keeps only the first row of each set of rows where the given expressions evaluate to equal. The DISTINCT ON expressions are interpreted using the same rules as for ORDER BY (see above). Note that the "first row" of each set is unpredictable unless ORDER BY is used to ensure that the desired row appears first.

With distinct on it is necessary to include the "distinct" columns in the order by. If that is not the order you want then you need to wrap the query and reorder

select 
    *
from (
    select distinct on (author.id)
        book.id, author.id, author.name, book.title as last_book
    from
        author
        inner join
        book on book.author_id = author.id
    order by author.id, book.id desc
) authors_with_first_book
order by authors_with_first_book.name

Another solution is to use a window function as in Lennart's answer. And another very generic one is this

select 
    book.id, author.id, author.name, book.title as last_book
from
    book
    inner join
    (
        select author.id as author_id, max(book.id) as book_id
        from
            author
            inner join
            book on author.id = book.author_id
        group by author.id
    ) s
    on s.book_id = book.id
    inner join
    author on book.author_id = author.id

Solution 2 - Sql

I've done something similar for a chat system, where room holds the metadata and list contains the messages. I ended up using the Postgresql LATERAL JOIN which worked like a charm.

SELECT MR.id AS room_id, MR.created_at AS room_created, 
    lastmess.content as lastmessage_content, lastmess.datetime as lastmessage_when
FROM message.room MR
    LEFT JOIN LATERAL (
        SELECT content, datetime
        FROM message.list
        WHERE room_id = MR.id
        ORDER BY datetime DESC 
        LIMIT 1) lastmess ON true
ORDER BY lastmessage_when DESC NULLS LAST, MR.created_at DESC

For more info see https://heap.io/blog/engineering/postgresqls-powerful-new-join-type-lateral

Solution 3 - Sql

This may look archaic and overly simple, but it does not depend on window functions, CTE's and aggregating subqueries. In most cases it is also the fastest.

SELECT bk.id, au.id, au.name, bk.title as last_book
FROM author au
JOIN book bk ON bk.author_id = au.id
WHERE NOT EXISTS (
    SELECT *
    FROM book nx
    WHERE nx.author_id = bk.author_id
    AND nx.book_id > bk.book_id
    )
ORDER BY book.id ASC
    ;

Solution 4 - Sql

Here is one way:

SELECT book_id, author_id, author_name, last_book
FROM (
    SELECT b.id as book_id
         , a.id as author_id
         , a.name as author_name
         , b.title as last_book
         , row_number() over (partition by a.id
                              order by b.id desc) as rn
    FROM author a
    JOIN book b 
        ON b.author_id = a.id
) last_books
WHERE rn = 1;

Solution 5 - Sql

You could add a rule into the join for specifying only one row. I had work for me.

Like this:

SELECT 
	book.id, 
	author.id, 
	author.name, 
	book.title as last_book
FROM author auth1
JOIN book book ON (book.author_id = auth1.id AND book.id = (select max(b.id) from book b where b.author_id = auth1))
GROUP BY auth1.id
ORDER BY book.id ASC

This way you get the data from the book with the higher ID. You could add "date" and make the same with the max(date).

Solution 6 - Sql

As a slight variation on @wildplasser's suggestion, which still works across implementations, you can use max rather than not exists. This reads better if you like short joins better than long where clauses

select * 
  from author au
  join (
    select max(id) as max_id, author_id
      from book bk
     group by author_id) as lb 
    on lb.author_id = au.id
  join bk 
    on bk.id = lb.max_id;

or, to give a name to the subquery, which clarifies things, go with WITH

with last_book as 
   (select max(id) as max_id, author_id
      from book bk
     group by author_id)

select * 
  from author au
  join last_book lb
    on au.id = lb.author_id
  join bk 
    on bk.id = lb.max_id;

Solution 7 - Sql

create temp table book_1 as (
SELECT
id
,title
,author_id
,row_number() OVER (PARTITION BY id) as rownum 
FROM
book)  distributed by ( id );

select author.id,b.id, author.id, author.name, b.title as last_book
from
    author

    left  join
   (select * from  book_1 where rownum = 1 ) b on b.author_id = author.id
order by author.id, b.id desc

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionBenjamin CrouzierView Question on Stackoverflow
Solution 1 - SqlClodoaldo NetoView Answer on Stackoverflow
Solution 2 - SqlMirtheView Answer on Stackoverflow
Solution 3 - SqlwildplasserView Answer on Stackoverflow
Solution 4 - SqlLennartView Answer on Stackoverflow
Solution 5 - SqlTaochokView Answer on Stackoverflow
Solution 6 - SqljobermarkView Answer on Stackoverflow
Solution 7 - SqlBobburi MadhuView Answer on Stackoverflow