Will ANSI JOIN vs. non-ANSI JOIN queries perform differently?

Sql ServerTsqlJoinSql

Sql Server Problem Overview


I have my business-logic in ~7000 lines of T-SQL stored procedures, and most of them has next JOIN syntax:

SELECT A.A, B.B, C.C
FROM aaa AS A, bbb AS B, ccc AS C
WHERE
    A.B = B.ID
AND B.C = C.ID
AND C.ID = @param

Will I get performance growth if I will replace such query with this:

SELECT A.A, B.B, C.C
FROM aaa AS A
JOIN bbb AS B
   ON A.B = B.ID
JOIN ccc AS C
   ON B.C = C.ID
   AND C.ID = @param

Or they are the same?

Sql Server Solutions


Solution 1 - Sql Server

The two queries are the same, except the second is ANSI-92 SQL syntax and the first is the older SQL syntax which didn't incorporate the join clause. They should produce exactly the same internal query plan, although you may like to check.

You should use the ANSI-92 syntax for several of reasons

  • The use of the JOIN clause separates the relationship logic from the filter logic (the WHERE) and is thus cleaner and easier to understand.
  • It doesn't matter with this particular query, but there are a few circumstances where the older outer join syntax (using + ) is ambiguous and the query results are hence implementation dependent - or the query cannot be resolved at all. These do not occur with ANSI-92
  • It's good practice as most developers and dba's will use ANSI-92 nowadays and you should follow the standard. Certainly all modern query tools will generate ANSI-92.
  • As pointed out by @gbn, it does tend to avoid accidental cross joins.

Myself I resisted ANSI-92 for some time as there is a slight conceptual advantage to the old syntax as it's a easier to envisage the SQL as a mass Cartesian join of all tables used followed by a filtering operation - a mental technique that can be useful for grasping what a SQL query is doing. However I decided a few years ago that I needed to move with the times and after a relatively short adjustment period I now strongly prefer it - predominantly because of the first reason given above. The only place that one should depart from the ANSI-92 syntax, or rather not use the option, is with natural joins which are implicitly dangerous.

Solution 2 - Sql Server

The second construct is known as the "infixed join syntax" in the SQL community. The first construct AFAIK doesn't have widely accepted name so let's call it the 'old style' inner join syntax.

The usual arguments go like this:

Pros of the 'Traditional' syntax: the predicates are physically grouped together in the WHERE clause in whatever order which makes the query generally, and n-ary relationships particularly, easier to read and understand (the ON clauses of the infixed syntax can spread out the predicates so you have to look for the appearance of one table or column over a visual distance).

Cons of the 'Traditional' syntax: There is no parse error when omitting one of the 'join' predicates and the result is a Cartesian product (known as a CROSS JOIN in the infixed syntax) and such an error can be tricky to detect and debug. Also, 'join' predicates and 'filtering' predicates are physically grouped together in the WHERE clause, which can cause them to be confused for one another.

Solution 3 - Sql Server

The two queries are equal - the first is using non-ANSI JOIN syntax, the 2nd is ANSI JOIN syntax. I recommend sticking with the ANSI JOIN syntax.

And yes, LEFT OUTER JOINs (which, btw are also ANSI JOIN syntax) are what you want to use when there's a possibility that the table you're joining to might not contain any matching records.

Reference: Conditional Joins in SQL Server

Solution 4 - Sql Server

OK, they execute the same. That's agreed. Unlike many I use the older convention. That SQL-92 is "easier to understand" is debatable. Having written programming languages for pushing 40 years (gulp) I know that 'easy to read' begins first, before any other convention, with 'visual acuity' (misapplied term here but it's the best phrase I can use). When reading SQL the FIRST thing you mind cares about is what tables are involved and then which table (most) defines the grain. Then you care about relevant constraints on the data, then the attributes selected. While SQL-92 mostly separates these ideas out, there are so many noise words, the mind's eye has to interpret and deal with these and it makes reading the SQL slower.

SELECT Mgt.attrib_a   AS attrib_a
      ,Sta.attrib_b   AS attrib_b
      ,Stb.attrib_c   AS attrib_c
FROM   Main_Grain_Table  Mgt
      ,Surrounding_TabA  Sta
      ,Surrounding_tabB  Stb
WHERE  Mgt.sta_join_col  = Sta.sta_join_col
AND    Mgt.stb_join_col  = Stb.stb_join_col
AND    Mgt.bus_logic_col = 'TIGHT'

Visual Acuity! Put the commas for new attributes in front It makes commenting code easier too Use a specific case for functions and keywords Use a specific case for tables Use a specific case for attributes Vertically Line up operators and operations Make the first table(s) in the FROM represent the grain of the data Make the first tables of the WHERE be join constraints and let the specific, tight constraints float to the bottom. Select 3 character alias for ALL tables in your database and use the alias EVERYWHERE you reference the table. You should use that alias as a prefix for (many) indexes on that table as well. 6 of 1 1/2 dozen of another, right? Maybe. But even if you're using ANSI-92 convention (as I have and in cases will continue to do) use visual acuity principles, verticle alignment to let your mind's eye avert to the places you want to see and and easily avoid things (particularly noise words) you don't need to.

Solution 5 - Sql Server

Execute both and check their query plans. They should be equal.

Solution 6 - Sql Server

In my mind the FROM clause is where I decide what columns I need in the rows for my SELECT clause to work on. It is where a business rule is expressed that will bring onto the same row, values needed in calculations. The business rule can be customers who have invoices, resulting in rows of invoices including the customer responsible. It could also be venues in the same postcode as clients, resulting in a list of venues and clients that are close together.

It is where I work out the centricity of the rows in my result set. After all, we are simply shown the metaphor of a list in RDBMSs, each list having a topic (the entity) and each row being an instance of the entity. If the row centricity is understood, the entity of the result set is understood.

The WHERE clause, which conceptually executes after the rows are defined in the from clause, culls rows not required (or includes rows that are required) for the SELECT clause to work on.

Because join logic can be expressed in both the FROM clause and the WHERE clause, and because the clauses exist to divide and conquer complex logic, I choose to put join logic that involves values in columns in the FROM clause because that is essentially expressing a business rule that is supported by matching values in columns.

i.e. I won't write a WHERE clause like this:

 WHERE Column1 = Column2

I will put that in the FROM clause like this:

 ON Column1 = Column2

Likewise, if a column is to be compared to external values (values that may or may not be in a column) such as comparing a postcode to a specific postcode, I will put that in the WHERE clause because I am essentially saying I only want rows like this.

i.e. I won't write a FROM clause like this:

 ON PostCode = '1234'

I will put that in the WHERE clause like this:

 WHERE PostCode = '1234'

Solution 7 - Sql Server

ANSI syntax does enforce neither predicate placement in the proper clause (be that ON or WHERE), nor the affinity of the ON clause to adjacent table reference. A developer is free to write a mess like this

SELECT
   C.FullName,
   C.CustomerCode,
   O.OrderDate,
   O.OrderTotal,
   OD.ExtendedShippingNotes
FROM
   Customer C
   CROSS JOIN Order O
   INNER JOIN OrderDetail OD
      ON C.CustomerID = O.CustomerID
      AND C.CustomerStatus = 'Preferred'
      AND O.OrderTotal > 1000.0
WHERE
   O.OrderID = OD.OrderID;

Speaking of query tools who "will generate ANSI-92", I'm commenting here because it generated

SELECT 1
   FROM DEPARTMENTS C
        JOIN EMPLOYEES A
             JOIN JOBS B
     ON C.DEPARTMENT_ID = A.DEPARTMENT_ID
     ON A.JOB_ID = B.JOB_ID

The only syntax that escapes conventional "restrict-project-cartesian product" is outer join. This operation is more complicated because it is not associative (both with itself and with normal join). One have to judiciously parenthesize query with outer join, at least. However, it is an exotic operation; if you are using it too often I suggest taking relational database class.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionabatishchevView Question on Stackoverflow
Solution 1 - Sql ServerCruachanView Answer on Stackoverflow
Solution 2 - Sql ServeronedaywhenView Answer on Stackoverflow
Solution 3 - Sql ServerOMG PoniesView Answer on Stackoverflow
Solution 4 - Sql ServerTwoEdgedSwordView Answer on Stackoverflow
Solution 5 - Sql ServersisveView Answer on Stackoverflow
Solution 6 - Sql ServerJohnView Answer on Stackoverflow
Solution 7 - Sql ServerTegiri NenashiView Answer on Stackoverflow