SQL join: selecting the last records in a one-to-many relationship

SqlSelectJoinIndexingGreatest N-per-Group

Sql Problem Overview


Suppose I have a table of customers and a table of purchases. Each purchase belongs to one customer. I want to get a list of all customers along with their last purchase in one SELECT statement. What is the best practice? Any advice on building indexes?

Please use these table/column names in your answer:

  • customer: id, name
  • purchase: id, customer_id, item_id, date

And in more complicated situations, would it be (performance-wise) beneficial to denormalize the database by putting the last purchase into the customer table?

If the (purchase) id is guaranteed to be sorted by date, can the statements be simplified by using something like LIMIT 1?

Sql Solutions


Solution 1 - Sql

This is an example of the greatest-n-per-group problem that has appeared regularly on StackOverflow.

Here's how I usually recommend solving it:

SELECT c.*, p1.*
FROM customer c
JOIN purchase p1 ON (c.id = p1.customer_id)
LEFT OUTER JOIN purchase p2 ON (c.id = p2.customer_id AND 
    (p1.date < p2.date OR (p1.date = p2.date AND p1.id < p2.id)))
WHERE p2.id IS NULL;

Explanation: given a row p1, there should be no row p2 with the same customer and a later date (or in the case of ties, a later id). When we find that to be true, then p1 is the most recent purchase for that customer.

Regarding indexes, I'd create a compound index in purchase over the columns (customer_id, date, id). That may allow the outer join to be done using a covering index. Be sure to test on your platform, because optimization is implementation-dependent. Use the features of your RDBMS to analyze the optimization plan. E.g. EXPLAIN on MySQL.


Some people use subqueries instead of the solution I show above, but I find my solution makes it easier to resolve ties.

Solution 2 - Sql

You could also try doing this using a sub select

SELECT	c.*, p.*
FROM	customer c INNER JOIN
		(
			SELECT	customer_id,
					MAX(date) MaxDate
			FROM	purchase
			GROUP BY customer_id
		) MaxDates ON c.id = MaxDates.customer_id INNER JOIN
		purchase p ON	MaxDates.customer_id = p.customer_id
					AND	MaxDates.MaxDate = p.date

The select should join on all customers and their Last purchase date.

Solution 3 - Sql

Another approach would be to use a NOT EXISTS condition in your join condition to test for later purchases:

SELECT *
FROM customer c
LEFT JOIN purchase p ON (
       c.id = p.customer_id
   AND NOT EXISTS (
     SELECT 1 FROM purchase p1
     WHERE p1.customer_id = c.id
     AND p1.id > p.id
   )
)

Solution 4 - Sql

If you're using PostgreSQL you can use DISTINCT ON to find the first row in a group.

SELECT customer.*, purchase.*
FROM customer
JOIN (
   SELECT DISTINCT ON (customer_id) *
   FROM purchase
   ORDER BY customer_id, date DESC
) purchase ON purchase.customer_id = customer.id

PostgreSQL Docs - Distinct On

Note that the DISTINCT ON field(s) -- here customer_id -- must match the left most field(s) in the ORDER BY clause.

Caveat: This is a nonstandard clause.

Solution 5 - Sql

You haven't specified the database. If it is one that allows analytical functions it may be faster to use this approach than the GROUP BY one(definitely faster in Oracle, most likely faster in the late SQL Server editions, don't know about others).

Syntax in SQL Server would be:

SELECT c.*, p.*
FROM customer c INNER JOIN 
     (SELECT RANK() OVER (PARTITION BY customer_id ORDER BY date DESC) r, *
             FROM purchase) p
ON (c.id = p.customer_id)
WHERE p.r = 1

Solution 6 - Sql

I found this thread as a solution to my problem.

But when I tried them the performance was low. Bellow is my suggestion for better performance.

With MaxDates as (
SELECT  customer_id,
                MAX(date) MaxDate
        FROM    purchase
        GROUP BY customer_id
)

SELECT  c.*, M.*
FROM    customer c INNER JOIN
        MaxDates as M ON c.id = M.customer_id 

Hope this will be helpful.

Solution 7 - Sql

Try this, It will help.

I have used this in my project.

SELECT 
*
FROM
customer c
OUTER APPLY(SELECT top 1 * FROM purchase pi 
WHERE pi.customer_id = c.Id order by pi.Id desc) AS [LastPurchasePrice]

Solution 8 - Sql

Tested on SQLite:

SELECT c.*, p.*, max(p.date)
FROM customer c
LEFT OUTER JOIN purchase p
ON c.id = p.customer_id
GROUP BY c.id

The max() aggregate function will make sure that the latest purchase is selected from each group (but assumes that the date column is in a format whereby max() gives the latest - which is normally the case). If you want to handle purchases with the same date then you can use max(p.date, p.id).

In terms of indexes, I would use an index on purchase with (customer_id, date, [any other purchase columns you want to return in your select]).

The LEFT OUTER JOIN (as opposed to INNER JOIN) will make sure that customers that have never made a purchase are also included.

Solution 9 - Sql

Please try this,

SELECT 
c.Id,
c.name,
(SELECT pi.price FROM purchase pi WHERE pi.Id = MAX(p.Id)) AS [LastPurchasePrice]
FROM customer c INNER JOIN purchase p 
ON c.Id = p.customerId 
GROUP BY c.Id,c.name;

Solution 10 - Sql

I needed what you needed, albeit many years later, and tried the two most popular answers. These did not yield the desired fruit. So this is what I have to offer... For clarity, I changed some names.

SELECT 
  cc.pk_ID AS pk_Customer_ID, 
  cc.Customer_Name AS Customer_Name, 
  IFNULL(pp.pk_ID, '') AS fk_Purchase_ID,
  IFNULL(pp.fk_Customer_ID, '') AS fk_Customer_ID,
  IFNULL(pp.fk_Item_ID, '') AS fk_Item_ID,
  IFNULL(pp.Purchase_Date, '') AS Purchase_Date
FROM customer cc
LEFT JOIN purchase pp ON (
  SELECT zz.pk_ID 
  FROM purchase zz 
  WHERE cc.pk_ID = zz.fk_Customer_ID 
  ORDER BY zz.Purchase_Date DESC LIMIT 1) = pp.pk_ID
ORDER BY cc.pk_ID;

Solution 11 - Sql

On SQL Server you could use:

SELECT *
FROM customer c
INNER JOIN purchase p on c.id = p.customer_id
WHERE p.id = (
	SELECT TOP 1 p2.id
	FROM purchase p2
	WHERE p.customer_id = p2.customer_id
	ORDER BY date DESC
)

SQL Server Fiddle: http://sqlfiddle.com/#!18/262fd/2

On MySQL you could use:

SELECT c.name, date
FROM customer c
INNER JOIN purchase p on c.id = p.customer_id
WHERE p.id = (
	SELECT p2.id
	FROM purchase p2
	WHERE p.customer_id = p2.customer_id
	ORDER BY date DESC
    LIMIT 1
)

MySQL Fiddle: http://sqlfiddle.com/#!9/202613/7

Solution 12 - Sql

Without getting into the code first, the logic/algorithm goes below:

  1. Go to the transaction table with multiple records for the same client.

  2. Select records of clientID and the latestDate of client's activity using group by clientID and max(transactionDate)

       select clientID, max(transactionDate) as latestDate 
       from transaction 
       group by clientID
    
  3. inner join the transaction table with the outcome from Step 2, then you will have the full records of the transaction table with only each client's latest record.

       select * from 
       transaction t 
       inner join (
         select clientID, max(transactionDate) as latestDate
         from transaction 
         group by clientID) d 
       on t.clientID = d.clientID and t.transactionDate = d.latestDate) 
    
  4. You can use the result from step 3 to join any table you want to get different results.

Solution 13 - Sql

Tables :

Customer => id, name
Purchase => id, customer_id, item_id, date

Query :

SELECT C.id, C.name, P.id, P.date
  FROM customer AS C
  LEFT JOIN purchase AS P ON 
    (
      P.customer_id = C.id 
      AND P.id IN (
        SELECT MAX(PP.id) FROM purchase AS PP GROUP BY PP.customer_id
      )
    )

You can also specify some condition into sub select query

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionnetvopeView Question on Stackoverflow
Solution 1 - SqlBill KarwinView Answer on Stackoverflow
Solution 2 - SqlAdriaan StanderView Answer on Stackoverflow
Solution 3 - SqlStefan HaberlView Answer on Stackoverflow
Solution 4 - SqlTate ThurstonView Answer on Stackoverflow
Solution 5 - SqlMadalina DragomirView Answer on Stackoverflow
Solution 6 - SqlMatheeView Answer on Stackoverflow
Solution 7 - SqlRahul MurariView Answer on Stackoverflow
Solution 8 - SqlMarkView Answer on Stackoverflow
Solution 9 - SqlMilad ShahbaziView Answer on Stackoverflow
Solution 10 - SqlDanimalReksView Answer on Stackoverflow
Solution 11 - SqlcelsowmView Answer on Stackoverflow
Solution 12 - SqlGary Bao 鲍昱彤View Answer on Stackoverflow
Solution 13 - SqlGaurav PatilView Answer on Stackoverflow