Does the order of where clauses matter in SQL?

SqlPerformanceWhere Clause

Sql Problem Overview


Let's say I have a table called PEOPLE having three columns, ID, LastName, and FirstName. None of these columns are indexed. LastName is more unique, and FirstName is less unique.

If I do two searches:

select * from PEOPLE where FirstName="F" and LastName="L"
select * from PEOPLE where LastName="L" and FirstName="F"

My belief is the second one is faster because the more unique criterion (LastName) comes first in the where clause, and records will get eliminated more efficiently. I don't think the optimizer is smart enough to optimize the first SQL query.

Is my understanding correct?

Sql Solutions


Solution 1 - Sql

No, that order doesn't matter (or at least: shouldn't matter).

Any decent query optimizer will look at all the parts of the WHERE clause and figure out the most efficient way to satisfy that query.

I know the SQL Server query optimizer will pick a suitable index - no matter which order you have your two conditions in. I assume other RDBMS will have similar strategies.

What does matter is whether or not you have a suitable index for this!

In the case of SQL Server, it will likely use an index if you have:

  • an index on (LastName, FirstName)
  • an index on (FirstName, LastName)
  • an index on just (LastName), or just (FirstName) (or both)

On the other hand - again for SQL Server - if you use SELECT * to grab all columns from a table, and the table is rather small, then there's a good chance the query optimizer will just do a table (or clustered index) scan instead of using an index (because the lookup into the full data page to get all other columns just gets too expensive very quickly).

Solution 2 - Sql

The order of WHERE clauses should not make a difference in a database that conforms to the SQL standard. The order of evaluation is not guaranteed in most databases.

Do not think that SQL cares about the order. The following generates an error in SQL Server:

select *
from INFORMATION_SCHEMA.TABLES
where ISNUMERIC(table_name) = 1 and CAST(table_name as int) <> 0

If the first part of this clause were executed first, then only numeric table names would be cast as integers. However, it fails, providing a clear example that SQL Server (as with other databases) does not care about the order of clauses in the WHERE statement.

Solution 3 - Sql

ANSI SQL Draft 2003 5WD-01-Framework-2003-09.pdf

6.3.3.3 Rule evaluation order

...

Where the precedence is not determined by the Formats or by parentheses, effective evaluation of expressions is generally performed from left to right. However, it is implementation-dependent whether expressions are actually evaluated left to right, particularly when operands or operators might cause conditions to be raised or if the results of the expressions can be determined without completely evaluating all parts of the expression.

copied from here

Solution 4 - Sql

No, all the RDBMs first start by analysing the query and optimize it by reordering your where clause.

Depending on which RDBM you are you using can display what is the result of the analyse (search for explain plan in oracle for instance)

M.

Solution 5 - Sql

It's true as far as it goes, assuming the names aren't indexed. Different data would make it wrong though. In order to find out which way to do it, which could differ every time, the DBMS would have to run a distinct count query for each column and compare the numbers, that would cost more than just shrugging and getting on with it.

Solution 6 - Sql

Original OP statement >My belief is the second one is faster because the more unique criterion (LastName) comes first in >the where clause, and records will get eliminated more efficiently. I don't think the optimizer is >smart enough to optimize the first sql.

I guess you are confusing this with selecting the order of columns while creating the indexes where you have to put the more selective columns first than second most selective and so on.

BTW, for the above two query SQL server optimizer will not do any optimization but will use Trivila plan as long as the total cost of the plan is less than parallelism threshold cost.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionZiyang ZhangView Question on Stackoverflow
Solution 1 - Sqlmarc_sView Answer on Stackoverflow
Solution 2 - SqlGordon LinoffView Answer on Stackoverflow
Solution 3 - Sql03UsrView Answer on Stackoverflow
Solution 4 - SqlpoussmaView Answer on Stackoverflow
Solution 5 - SqlTony HopkinsonView Answer on Stackoverflow
Solution 6 - SqlGulli MeelView Answer on Stackoverflow