MySQL Join Where Not Exists

MysqlJoinNot Exists

Mysql Problem Overview


I have a MySQL query that joins two tables

  • Voters
  • Households

They join on voters.household_id and household.id.

Now what I need to do is to modify it where the voter table is joined to a third table called elimination, along voter.id and elimination.voter_id. However the catch is that I want to exclude any records in the voter table that have a corresponding record in the elimination table.

How do I craft a query to do this?

This is my current query:

SELECT `voter`.`ID`, `voter`.`Last_Name`, `voter`.`First_Name`,
       `voter`.`Middle_Name`, `voter`.`Age`, `voter`.`Sex`,
       `voter`.`Party`, `voter`.`Demo`, `voter`.`PV`,
       `household`.`Address`, `household`.`City`, `household`.`Zip`
FROM (`voter`)
JOIN `household` ON `voter`.`House_ID`=`household`.`id`
WHERE `CT` = '5'
AND `Precnum` = 'CTY3'
AND  `Last_Name`  LIKE '%Cumbee%'
AND  `First_Name`  LIKE '%John%'
ORDER BY `Last_Name` ASC
LIMIT 30 

Mysql Solutions


Solution 1 - Mysql

I'd probably use a LEFT JOIN, which will return rows even if there's no match, and then you can select only the rows with no match by checking for NULLs.

So, something like:

SELECT V.*
FROM voter V LEFT JOIN elimination E ON V.id = E.voter_id
WHERE E.voter_id IS NULL

Whether that's more or less efficient than using a subquery depends on optimization, indexes, whether its possible to have more than one elimination per voter, etc.

Solution 2 - Mysql

I'd use a 'where not exists' -- exactly as you suggest in your title:

SELECT `voter`.`ID`, `voter`.`Last_Name`, `voter`.`First_Name`,
       `voter`.`Middle_Name`, `voter`.`Age`, `voter`.`Sex`,
       `voter`.`Party`, `voter`.`Demo`, `voter`.`PV`,
       `household`.`Address`, `household`.`City`, `household`.`Zip`
FROM (`voter`)
JOIN `household` ON `voter`.`House_ID`=`household`.`id`
WHERE `CT` = '5'
AND `Precnum` = 'CTY3'
AND  `Last_Name`  LIKE '%Cumbee%'
AND  `First_Name`  LIKE '%John%'

AND NOT EXISTS (
  SELECT * FROM `elimination`
   WHERE `elimination`.`voter_id` = `voter`.`ID`
)

ORDER BY `Last_Name` ASC
LIMIT 30

That may be marginally faster than doing a left join (of course, depending on your indexes, cardinality of your tables, etc), and is almost certainly much faster than using IN.

Solution 3 - Mysql

There are three possible ways to do that.

  1. Option

     SELECT  lt.* FROM    table_left lt
     LEFT JOIN
         table_right rt
     ON      rt.value = lt.value
     WHERE   rt.value IS NULL
    
  2. Option

     SELECT  lt.* FROM    table_left lt
     WHERE   lt.value NOT IN
     (
     SELECT  value
     FROM    table_right rt
     )
    
  3. Option

     SELECT  lt.* FROM    table_left lt
     WHERE   NOT EXISTS
     (
     SELECT  NULL
     FROM    table_right rt
     WHERE   rt.value = lt.value
     )
    

Solution 4 - Mysql

Be wary of "LEFT" JOINS - LEFT JOINS are essentially OUTER JOINS. Different RDBMS query parsers and optimizers may handle OUTER JOINS very differently. Take for instance, how LEFT (OUTER) JOINS are parsed by MySQL's query optimizer, and the difference in resulting execution plans they could evaluate to per iteration:

https://dev.mysql.com/doc/refman/8.0/en/outer-join-simplification.html

LEFT JOINS by their very nature are ALWAYS going to be NonDeterministic. IMO - they should not be used in Production code.

I prefer to write JOIN type statements in a more "old school" approach first, leaving out any specific JOIN declarations. Let the RDBMS query parser do what its designed to do - analyze your statement and translate it to most optimal execution plan based on its evaluation of your index stats and data model design. That said, the build in query parsers / optimizers can even get it wrong, trust me I've seen it happen many times. In general, I feel like taking this approach first generally provides sufficient baseline information to make informed further tuning decisions in most cases.

To illustrate - using the question query from this thread:

SELECT `voter`.`ID`, `voter`.`Last_Name`, `voter`.`First_Name`,
       `voter`.`Middle_Name`, `voter`.`Age`, `voter`.`Sex`,
       `voter`.`Party`, `voter`.`Demo`, `voter`.`PV`,
       `household`.`Address`, `household`.`City`, `household`.`Zip`
FROM (`voter`)
JOIN `household` ON `voter`.`House_ID`=`household`.`id`
WHERE `CT` = '5'
AND `Precnum` = 'CTY3'
AND  `Last_Name`  LIKE '%Cumbee%'
AND  `First_Name`  LIKE '%John%'

AND NOT EXISTS (
  SELECT * FROM `elimination`
   WHERE `elimination`.`voter_id` = `voter`.`ID`
)

ORDER BY `Last_Name` ASC
LIMIT 30

Consider it re-written without the explicit JOIN and NOT EXISTS statements above (assumes the non fully qualified fields in the WHERE clause belonged to the voter table):

SELECT v.`ID`, v.`Last_Name`, v.`First_Name`,
       v.`Middle_Name`, v.`Age`, v.`Sex`,
       v.`Party`, v.`Demo`, v.`PV`,
       h.`Address`, h.`City`, h.`Zip`
FROM `voter` v, `household` h, `elimination` e
WHERE v.`House_ID` = h.`id`
AND v.`ID` != e.`voter_id`
AND v.`CT` = '5'
AND v.`Precnum` = 'CTY3'
AND  v.`Last_Name`  LIKE '%Cumbee%'
AND  v.`First_Name`  LIKE '%John%'
ORDER BY v.`Last_Name` ASC
LIMIT 30;

Try writing some of your future SQL queries BOTH ways syntactically going forward, compare their results, and see what you think. Writing your SQL in the style I have suggested above comes with the added benefit of being more RDBMS agnostic, also.

Cheers!

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
Questiongsueagle2008View Question on Stackoverflow
Solution 1 - MysqlNickZoicView Answer on Stackoverflow
Solution 2 - MysqlIan ClellandView Answer on Stackoverflow
Solution 3 - MysqlDumindu MadushankaView Answer on Stackoverflow
Solution 4 - MysqlChristopher BishopView Answer on Stackoverflow