SQL performance on LEFT OUTER JOIN vs NOT EXISTS

SqlSql Server

Sql Problem Overview


If I want to find a set of entries in table A but not in table B, I can use either LEFT OUTER JOIN or NOT EXISTS. I've heard SQL Server is geared towards ANSI and in some case LEFT OUTER JOINs are far more efficient than NOT EXISTS. Will ANSI JOIN perform better in this case? and are join operators more efficient than NOT EXISTS in general on SQL Server?

Sql Solutions


Solution 1 - Sql

Joe's link is a good starting point. Quassnoi covers this too.

In general, if your fields are properly indexed, OR if you expect to filter out more records (i.e. have a lots of rows EXIST in the subquery) NOT EXISTS will perform better.

EXISTS and NOT EXISTS both short circuit - as soon as a record matches the criteria it's either included or filtered out and the optimizer moves on to the next record.

LEFT JOIN will join ALL RECORDS regardless of whether they match or not, then filter out all non-matching records. If your tables are large and/or you have multiple JOIN criteria, this can be very very resource intensive.

I normally try to use NOT EXISTS and EXISTS where possible. For SQL Server, IN and NOT IN are semantically equivalent and may be easier to write. These are among the only operators you will find in SQL Server that are guaranteed to short circuit.

Solution 2 - Sql

Personally, I think that this one gets a big old, "It Depends". I've seen instances where each method has outperformed the other.

Your best bet is to test both and see which performs better. If it's a situation where the tables will always be small and performance isn't as crucial then I'd just go with whichever is the clearest to you (that's usually NOT EXISTS for most people) and move on.

Solution 3 - Sql

This blog entry gives examples of various ways ( NOT IN, OUTER APPLY, LEFT OUTER JOIN, EXCEPT and NOT EXISTS ) to achieve same results and proves that Not Exists ( Left Anti Semi Join) is the best options in both cold cache and warm cache scenarios.

Solution 4 - Sql

I've been wondering how we can use the index on the table we are deleting from in these cases that the OP describes.

Say we have:

 table EMPLOYEE (emp_id int, name varchar) 
and
 table EMPLOYEE_LOCATION (emp_id int, loc_id int)

In my real world example my tables are much wider and contain 1million + rows, I have simplified the schema for example purpose.

If I want to delete the rows from EMPLOYEE_LOCATION that don't have corresponding emp_id's in EMPLOYEE I can obviously use the Left outer technique or the NOT IN but I was wondering...

If both tables have indexes with leading column of emp_id then would it be worthwhile trying to use them?

Perhaps I could pull the emp_id's from EMPLOYEE, the emp_id's from EMPLOYEE_LOCATION into a temp table and get the emp_id's from the temp tables that I want to delete.

I could then cycle round these emp_id's and actually use the index like so:

loop for each emp_id X to delete -- (this would be a cursor)
 DELETE EMPLOYEE_LOCATION WHERE emp_id = X

I know there is overhead with the cursor but in my real example I am dealing with huge tables so I think explicitly using the index is desirable.

Solution 5 - Sql

> Answer on dba.stackexchange

An exception I've noticed to the NOT EXISTS being superior (however marginally) to LEFT JOIN ... WHERE IS NULL is when using Linked Servers.

From examining the execution plans, it appears that NOT EXISTS operator gets executed in a nested loop fashion. Whereby it is executed on a per row basis (which I suppose makes sense).

Example execution plan demonstrating this behaviour: enter image description here

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionkefeizhouView Question on Stackoverflow
Solution 1 - SqlJNKView Answer on Stackoverflow
Solution 2 - SqlTom HView Answer on Stackoverflow
Solution 3 - SqlN30View Answer on Stackoverflow
Solution 4 - Sqluser3257560View Answer on Stackoverflow
Solution 5 - SqlpimView Answer on Stackoverflow