MySQL Join Where Not Exists
MysqlJoinNot ExistsMysql Problem Overview
I have a MySQL query that joins two tables
- Voters
- Households
They join on voters.household_id
and household.id
.
Now what I need to do is to modify it where the voter table is joined to a third table called elimination, along voter.id
and elimination.voter_id
. However the catch is that I want to exclude any records in the voter table that have a corresponding record in the elimination table.
How do I craft a query to do this?
This is my current query:
SELECT `voter`.`ID`, `voter`.`Last_Name`, `voter`.`First_Name`,
`voter`.`Middle_Name`, `voter`.`Age`, `voter`.`Sex`,
`voter`.`Party`, `voter`.`Demo`, `voter`.`PV`,
`household`.`Address`, `household`.`City`, `household`.`Zip`
FROM (`voter`)
JOIN `household` ON `voter`.`House_ID`=`household`.`id`
WHERE `CT` = '5'
AND `Precnum` = 'CTY3'
AND `Last_Name` LIKE '%Cumbee%'
AND `First_Name` LIKE '%John%'
ORDER BY `Last_Name` ASC
LIMIT 30
Mysql Solutions
Solution 1 - Mysql
I'd probably use a LEFT JOIN
, which will return rows even if there's no match, and then you can select only the rows with no match by checking for NULL
s.
So, something like:
SELECT V.*
FROM voter V LEFT JOIN elimination E ON V.id = E.voter_id
WHERE E.voter_id IS NULL
Whether that's more or less efficient than using a subquery depends on optimization, indexes, whether its possible to have more than one elimination per voter, etc.
Solution 2 - Mysql
I'd use a 'where not exists' -- exactly as you suggest in your title:
SELECT `voter`.`ID`, `voter`.`Last_Name`, `voter`.`First_Name`,
`voter`.`Middle_Name`, `voter`.`Age`, `voter`.`Sex`,
`voter`.`Party`, `voter`.`Demo`, `voter`.`PV`,
`household`.`Address`, `household`.`City`, `household`.`Zip`
FROM (`voter`)
JOIN `household` ON `voter`.`House_ID`=`household`.`id`
WHERE `CT` = '5'
AND `Precnum` = 'CTY3'
AND `Last_Name` LIKE '%Cumbee%'
AND `First_Name` LIKE '%John%'
AND NOT EXISTS (
SELECT * FROM `elimination`
WHERE `elimination`.`voter_id` = `voter`.`ID`
)
ORDER BY `Last_Name` ASC
LIMIT 30
That may be marginally faster than doing a left join (of course, depending on your indexes, cardinality of your tables, etc), and is almost certainly much faster than using IN.
Solution 3 - Mysql
There are three possible ways to do that.
-
Option
SELECT lt.* FROM table_left lt LEFT JOIN table_right rt ON rt.value = lt.value WHERE rt.value IS NULL
-
Option
SELECT lt.* FROM table_left lt WHERE lt.value NOT IN ( SELECT value FROM table_right rt )
-
Option
SELECT lt.* FROM table_left lt WHERE NOT EXISTS ( SELECT NULL FROM table_right rt WHERE rt.value = lt.value )
Solution 4 - Mysql
Be wary of "LEFT" JOINS - LEFT JOINS are essentially OUTER JOINS. Different RDBMS query parsers and optimizers may handle OUTER JOINS very differently. Take for instance, how LEFT (OUTER) JOINS are parsed by MySQL's query optimizer, and the difference in resulting execution plans they could evaluate to per iteration:
https://dev.mysql.com/doc/refman/8.0/en/outer-join-simplification.html
LEFT JOINS by their very nature are ALWAYS going to be NonDeterministic. IMO - they should not be used in Production code.
I prefer to write JOIN type statements in a more "old school" approach first, leaving out any specific JOIN declarations. Let the RDBMS query parser do what its designed to do - analyze your statement and translate it to most optimal execution plan based on its evaluation of your index stats and data model design. That said, the build in query parsers / optimizers can even get it wrong, trust me I've seen it happen many times. In general, I feel like taking this approach first generally provides sufficient baseline information to make informed further tuning decisions in most cases.
To illustrate - using the question query from this thread:
SELECT `voter`.`ID`, `voter`.`Last_Name`, `voter`.`First_Name`,
`voter`.`Middle_Name`, `voter`.`Age`, `voter`.`Sex`,
`voter`.`Party`, `voter`.`Demo`, `voter`.`PV`,
`household`.`Address`, `household`.`City`, `household`.`Zip`
FROM (`voter`)
JOIN `household` ON `voter`.`House_ID`=`household`.`id`
WHERE `CT` = '5'
AND `Precnum` = 'CTY3'
AND `Last_Name` LIKE '%Cumbee%'
AND `First_Name` LIKE '%John%'
AND NOT EXISTS (
SELECT * FROM `elimination`
WHERE `elimination`.`voter_id` = `voter`.`ID`
)
ORDER BY `Last_Name` ASC
LIMIT 30
Consider it re-written without the explicit JOIN and NOT EXISTS statements above (assumes the non fully qualified fields in the WHERE clause belonged to the voter table):
SELECT v.`ID`, v.`Last_Name`, v.`First_Name`,
v.`Middle_Name`, v.`Age`, v.`Sex`,
v.`Party`, v.`Demo`, v.`PV`,
h.`Address`, h.`City`, h.`Zip`
FROM `voter` v, `household` h, `elimination` e
WHERE v.`House_ID` = h.`id`
AND v.`ID` != e.`voter_id`
AND v.`CT` = '5'
AND v.`Precnum` = 'CTY3'
AND v.`Last_Name` LIKE '%Cumbee%'
AND v.`First_Name` LIKE '%John%'
ORDER BY v.`Last_Name` ASC
LIMIT 30;
Try writing some of your future SQL queries BOTH ways syntactically going forward, compare their results, and see what you think. Writing your SQL in the style I have suggested above comes with the added benefit of being more RDBMS agnostic, also.
Cheers!