Left Join without duplicate rows from left table

SqlJoinDuplicates

Sql Problem Overview


Please look at the following query:

tbl_Contents

Content_Id	Content_Title	 Content_Text
10002	New case Study	 New case Study
10003	New case Study	 New case Study
10004	New case Study	 New case Study
10005	New case Study	 New case Study
10006	New case Study	 New case Study
10007	New case Study	 New case Study
10008	New case Study	 New case Study
10009	New case Study	 New case Study
10010	SEO News Title	 SEO News Text
10011	SEO News Title	 SEO News Text
10012	Publish Contents SEO News Text

tbl_Media

Media_Id	Media_Title	 Content_Id
1000	New case Study	 10012
1001	SEO News Title	 10010
1002	SEO News Title	 10011
1003	Publish Contents 10012

QUERY

SELECT 
C.Content_ID,
C.Content_Title,
M.Media_Id
		
FROM tbl_Contents C
LEFT JOIN tbl_Media M ON M.Content_Id = C.Content_Id 
ORDER BY C.Content_DatePublished ASC

RESULT

10002	New case Study	2014-03-31 13:39:29.280	NULL
10003	New case Study	2014-03-31 14:23:06.727	NULL
10004	New case Study	2014-03-31 14:25:53.143	NULL
10005	New case Study	2014-03-31 14:26:06.993	NULL
10006	New case Study	2014-03-31 14:30:18.153	NULL
10007	New case Study	2014-03-31 14:30:42.513	NULL
10008	New case Study	2014-03-31 14:31:56.830	NULL
10009	New case Study	2014-03-31 14:35:18.040	NULL
10010	SEO News Title	2014-03-31 15:22:15.983	1001
10011	SEO News Title	2014-03-31 15:22:30.333	1002
10012	Publish         2014-03-31 15:25:11.753	1000
10012	Publish         2014-03-31 15:25:11.753	1003

10012 are coming twice...!

My query is returning duplicate rows from tbl_Contents (left table in the join)

Some rows in tbl_Contents has more than 1 associated rows in tbl_Media. I need all rows from tbl_Contents even if there are Null values exists in the tbl_Media BUT NO DUPLICATE RECORDS.

Sql Solutions


Solution 1 - Sql

Try an OUTER APPLY

SELECT 
    C.Content_ID,
    C.Content_Title,
    C.Content_DatePublished,
    M.Media_Id
FROM 
    tbl_Contents C
    OUTER APPLY
    (
        SELECT TOP 1 *
        FROM tbl_Media M 
        WHERE M.Content_Id = C.Content_Id 
    ) m
ORDER BY 
    C.Content_DatePublished ASC

Alternatively, you could GROUP BY the results

SELECT 
    C.Content_ID,
    C.Content_Title,
    C.Content_DatePublished,
    M.Media_Id
FROM 
    tbl_Contents C
    LEFT OUTER JOIN tbl_Media M ON M.Content_Id = C.Content_Id 
GROUP BY
    C.Content_ID,
    C.Content_Title,
    C.Content_DatePublished,
    M.Media_Id
ORDER BY
    C.Content_DatePublished ASC

The OUTER APPLY selects a single row (or none) that matches each row from the left table.

The GROUP BY performs the entire join, but then collapses the final result rows on the provided columns.

Solution 2 - Sql

You can do this using generic SQL with group by:

SELECT C.Content_ID, C.Content_Title, MAX(M.Media_Id)
FROM tbl_Contents C LEFT JOIN
     tbl_Media M
     ON M.Content_Id = C.Content_Id 
GROUP BY C.Content_ID, C.Content_Title
ORDER BY MAX(C.Content_DatePublished) ASC;

Or with a correlated subquery:

SELECT C.Content_ID, C.Contt_Title,
       (SELECT M.Media_Id
        FROM tbl_Media M
        WHERE M.Content_Id = C.Content_Id
        ORDER BY M.MEDIA_ID DESC
        LIMIT 1
       ) as Media_Id
FROM tbl_Contents C 
ORDER BY C.Content_DatePublished ASC;

Of course, the syntax for limit 1 varies between databases. Could be top. Or rownum = 1. Or fetch first 1 rows. Or something like that.

Solution 3 - Sql

Using the DISTINCT flag will remove duplicate rows.

SELECT DISTINCT
C.Content_ID,
C.Content_Title,
M.Media_Id

FROM tbl_Contents C
LEFT JOIN tbl_Media M ON M.Content_Id = C.Content_Id 
ORDER BY C.Content_DatePublished ASC

Solution 4 - Sql

The table tbl_media has the content_id 10012 appearing two times, so when tbl_content joins with tbl_media, it will capture 10012 twice, creating one row with media_id = 1003 and another one with media_id = 1000 (so if you consider both content_id and media_id, the information is actually not duplicated).

The question becomes, what is the correct media to prioritize for content 10012 (1000 or 1003)? Once you determine this relationship, you can adjust the IDs tbl_media and join the tables without creating duplicates on media_id level.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
Questionurooj.orgView Question on Stackoverflow
Solution 1 - Sqleouw0o83hfView Answer on Stackoverflow
Solution 2 - SqlGordon LinoffView Answer on Stackoverflow
Solution 3 - SqlVictor MoxleyView Answer on Stackoverflow
Solution 4 - SqlFernando C. A. PereiraView Answer on Stackoverflow