Left Join without duplicate rows from left table
SqlJoinDuplicatesSql Problem Overview
Please look at the following query:
tbl_Contents
Content_Id Content_Title Content_Text
10002 New case Study New case Study
10003 New case Study New case Study
10004 New case Study New case Study
10005 New case Study New case Study
10006 New case Study New case Study
10007 New case Study New case Study
10008 New case Study New case Study
10009 New case Study New case Study
10010 SEO News Title SEO News Text
10011 SEO News Title SEO News Text
10012 Publish Contents SEO News Text
tbl_Media
Media_Id Media_Title Content_Id
1000 New case Study 10012
1001 SEO News Title 10010
1002 SEO News Title 10011
1003 Publish Contents 10012
QUERY
SELECT
C.Content_ID,
C.Content_Title,
M.Media_Id
FROM tbl_Contents C
LEFT JOIN tbl_Media M ON M.Content_Id = C.Content_Id
ORDER BY C.Content_DatePublished ASC
RESULT
10002 New case Study 2014-03-31 13:39:29.280 NULL
10003 New case Study 2014-03-31 14:23:06.727 NULL
10004 New case Study 2014-03-31 14:25:53.143 NULL
10005 New case Study 2014-03-31 14:26:06.993 NULL
10006 New case Study 2014-03-31 14:30:18.153 NULL
10007 New case Study 2014-03-31 14:30:42.513 NULL
10008 New case Study 2014-03-31 14:31:56.830 NULL
10009 New case Study 2014-03-31 14:35:18.040 NULL
10010 SEO News Title 2014-03-31 15:22:15.983 1001
10011 SEO News Title 2014-03-31 15:22:30.333 1002
10012 Publish 2014-03-31 15:25:11.753 1000
10012 Publish 2014-03-31 15:25:11.753 1003
10012 are coming twice...!
My query is returning duplicate rows from tbl_Contents (left table in the join)
Some rows in tbl_Contents has more than 1 associated rows in tbl_Media. I need all rows from tbl_Contents even if there are Null values exists in the tbl_Media BUT NO DUPLICATE RECORDS.
Sql Solutions
Solution 1 - Sql
Try an OUTER APPLY
SELECT
C.Content_ID,
C.Content_Title,
C.Content_DatePublished,
M.Media_Id
FROM
tbl_Contents C
OUTER APPLY
(
SELECT TOP 1 *
FROM tbl_Media M
WHERE M.Content_Id = C.Content_Id
) m
ORDER BY
C.Content_DatePublished ASC
Alternatively, you could GROUP BY
the results
SELECT
C.Content_ID,
C.Content_Title,
C.Content_DatePublished,
M.Media_Id
FROM
tbl_Contents C
LEFT OUTER JOIN tbl_Media M ON M.Content_Id = C.Content_Id
GROUP BY
C.Content_ID,
C.Content_Title,
C.Content_DatePublished,
M.Media_Id
ORDER BY
C.Content_DatePublished ASC
The OUTER APPLY
selects a single row (or none) that matches each row from the left table.
The GROUP BY
performs the entire join, but then collapses the final result rows on the provided columns.
Solution 2 - Sql
You can do this using generic SQL with group by
:
SELECT C.Content_ID, C.Content_Title, MAX(M.Media_Id)
FROM tbl_Contents C LEFT JOIN
tbl_Media M
ON M.Content_Id = C.Content_Id
GROUP BY C.Content_ID, C.Content_Title
ORDER BY MAX(C.Content_DatePublished) ASC;
Or with a correlated subquery:
SELECT C.Content_ID, C.Contt_Title,
(SELECT M.Media_Id
FROM tbl_Media M
WHERE M.Content_Id = C.Content_Id
ORDER BY M.MEDIA_ID DESC
LIMIT 1
) as Media_Id
FROM tbl_Contents C
ORDER BY C.Content_DatePublished ASC;
Of course, the syntax for limit 1
varies between databases. Could be top
. Or rownum = 1
. Or fetch first 1 rows
. Or something like that.
Solution 3 - Sql
Using the DISTINCT flag will remove duplicate rows.
SELECT DISTINCT
C.Content_ID,
C.Content_Title,
M.Media_Id
FROM tbl_Contents C
LEFT JOIN tbl_Media M ON M.Content_Id = C.Content_Id
ORDER BY C.Content_DatePublished ASC
Solution 4 - Sql
The table tbl_media has the content_id 10012 appearing two times, so when tbl_content joins with tbl_media, it will capture 10012 twice, creating one row with media_id = 1003 and another one with media_id = 1000 (so if you consider both content_id and media_id, the information is actually not duplicated).
The question becomes, what is the correct media to prioritize for content 10012 (1000 or 1003)? Once you determine this relationship, you can adjust the IDs tbl_media and join the tables without creating duplicates on media_id level.