How do SQL EXISTS statements work?

Sql

Sql Problem Overview


I'm trying to learn SQL and am having a hard time understanding EXISTS statements. I came across this quote about "exists" and don't understand something:

>Using the exists operator, your subquery can return zero, one, or many rows, and the condition simply checks whether the subquery returned any rows. If you look at the select clause of the subquery, you will see that it consists of a single literal (1); since the condition in the containing query only needs to know how many rows have been returned, the actual data the subquery returned is irrelevant.

What I don't understand is how does the outer query know which row the subquery is checking? For example:

SELECT *
  FROM suppliers
 WHERE EXISTS (select *
                 from orders
                where suppliers.supplier_id = orders.supplier_id);

I understand that if the id from the supplier and orders table match, the subquery will return true and all the columns from the matching row in the suppliers' table will be outputted. What I don't get is how the subquery communicates which specific row (lets say the row with supplier id 25) should be printed if only a true or false is being returned.

It appears to me that there is no relationship between the outer query and the subquery.

Sql Solutions


Solution 1 - Sql

Think of it this way:

For 'each' row from Suppliers, check if there 'exists' a row in the Order table that meets the condition Suppliers.supplier_id (this comes from Outer query current 'row') = Orders.supplier_id. When you find the first matching row, stop right there - the WHERE EXISTS has been satisfied.

The magic link between the outer query and the subquery lies in the fact that Supplier_id gets passed from the outer query to the subquery for each row evaluated.

Or, to put it another way, the subquery is executed for each table row of the outer query.

It is NOT like the subquery is executed on the whole and gets the 'true/false' and then tries to match this 'true/false' condition with outer query.

Solution 2 - Sql

>It appears to me that there is no relationship between the outer query and the subquery.

What do you think the WHERE clause inside the EXISTS example is doing? How do you come to that conclusion when the SUPPLIERS reference isn't in the FROM or JOIN clauses within the EXISTS clause?

EXISTS valuates for TRUE/FALSE, and exits as TRUE on the first match of the criteria -- this is why it can be faster than IN. Also be aware that the SELECT clause in an EXISTS is ignored - IE:

SELECT s.*
  FROM SUPPLIERS s
 WHERE EXISTS (SELECT 1/0
                 FROM ORDERS o
                WHERE o.supplier_id = s.supplier_id)

...should hit a division by zero error, but it won't. The WHERE clause is the most important piece of an EXISTS clause.

Also be aware that a JOIN is not a direct replacement for EXISTS, because there will be duplicate parent records if there's more than one child record associated to the parent.

Solution 3 - Sql

You can produce identical results using either JOIN, EXISTS, IN, or INTERSECT:

SELECT s.supplier_id
FROM suppliers s
INNER JOIN (SELECT DISTINCT o.supplier_id FROM orders o) o
    ON o.supplier_id = s.supplier_id

SELECT s.supplier_id
FROM suppliers s
WHERE EXISTS (SELECT * FROM orders o WHERE o.supplier_id = s.supplier_id)

SELECT s.supplier_id 
FROM suppliers s 
WHERE s.supplier_id IN (SELECT o.supplier_id FROM orders o)

SELECT s.supplier_id
FROM suppliers s
INTERSECT
SELECT o.supplier_id
FROM orders o

Solution 4 - Sql

If you had a where clause that looked like this:

WHERE id in (25,26,27) -- and so on

you can easily understand why some rows are returned and some are not.

When the where clause is like this:

WHERE EXISTS (select * from orders where suppliers.supplier_id = orders.supplier_id);

it just means : return rows that have an existing record in the orders table with te same id.

Solution 5 - Sql

Database table model

Let’s assume we have the following two tables in our database, that form a one-to-many table relationship.

SQL EXISTS tables

The student table is the parent, and the student_grade is the child table since it has a student_id Foreign Key column referencing the id Primary Key column in the student table.

The student table contains the following two records:

| id | first_name | last_name | admission_score |
|----|------------|-----------|-----------------|
| 1  | Alice      | Smith     | 8.95            |
| 2  | Bob        | Johnson   | 8.75            |

And, the student_grade table stores the grades the students received:

| id | class_name | grade | student_id |
|----|------------|-------|------------|
| 1  | Math       | 10    | 1          |
| 2  | Math       | 9.5   | 1          |
| 3  | Math       | 9.75  | 1          |
| 4  | Science    | 9.5   | 1          |
| 5  | Science    | 9     | 1          |
| 6  | Science    | 9.25  | 1          |
| 7  | Math       | 8.5   | 2          |
| 8  | Math       | 9.5   | 2          |
| 9  | Math       | 9     | 2          |
| 10 | Science    | 10    | 2          |
| 11 | Science    | 9.4   | 2          |

SQL EXISTS

Let’s say we want to get all students that have received a 10 grade in Math class.

If we are only interested in the student identifier, then we can run a query like this one:

SELECT
    student_grade.student_id
FROM
    student_grade
WHERE
    student_grade.grade = 10 AND
    student_grade.class_name = 'Math'
ORDER BY
    student_grade.student_id

But, the application is interested in displaying the full name of a student, not just the identifier, so we need info from the student table as well.

In order to filter the student records that have a 10 grade in Math, we can use the EXISTS SQL operator, like this:

SELECT
    id, first_name, last_name
FROM
    student
WHERE EXISTS (
    SELECT 1
    FROM
        student_grade
    WHERE
        student_grade.student_id = student.id AND
        student_grade.grade = 10 AND
        student_grade.class_name = 'Math'
)
ORDER BY id

When running the query above, we can see that only the Alice row is selected:

| id | first_name | last_name |
|----|------------|-----------|
| 1  | Alice      | Smith     |

The outer query selects the student row columns we are interested in returning to the client. However, the WHERE clause is using the EXISTS operator with an associated inner subquery.

The EXISTS operator returns true if the subquery returns at least one record and false if no row is selected. The database engine does not have to run the subquery entirely. If a single record is matched, the EXISTS operator returns true, and the associated other query row is selected.

The inner subquery is correlated because the student_id column of the student_grade table is matched against the id column of the outer student table.

Solution 6 - Sql

EXISTS means that the subquery returns at least one row, that's really it. In that case, it's a correlated subquery because it checks the supplier_id of the outer table to the supplier_id of the inner table. This query says, in effect:

SELECT all suppliers For each supplier ID, see if an order exists for this supplier If the supplier is not present in the orders table, remove the supplier from the results RETURN all suppliers who have corresponding rows in the orders table

You could do the same thing in this case with an INNER JOIN.

SELECT suppliers.* 
  FROM suppliers 
 INNER 
  JOIN orders 
    ON suppliers.supplier_id = orders.supplier_id;

Ponies comment is correct. You'd need to do grouping with that join, or select distinct depending on the data you need.

Solution 7 - Sql

What you describe is a so called query with a correlated subquery.

(In general) it's something that you should try to avoid by writing the query by using a join instead:

SELECT suppliers.* 
FROM suppliers 
JOIN orders USING supplier_id
GROUP BY suppliers.supplier_id

Because otherwise, the subquery will be executed for each row in the outer query.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionDanView Question on Stackoverflow
Solution 1 - SqlsojinView Answer on Stackoverflow
Solution 2 - SqlOMG PoniesView Answer on Stackoverflow
Solution 3 - SqlAnthony FaullView Answer on Stackoverflow
Solution 4 - SqlMenahemView Answer on Stackoverflow
Solution 5 - SqlVlad MihalceaView Answer on Stackoverflow
Solution 6 - SqlDavid FellsView Answer on Stackoverflow
Solution 7 - SqlWouter van NifterickView Answer on Stackoverflow