Does the order of columns matter in a group by clause?

SqlSql ServerGroup By

Sql Problem Overview


If I have two columns, one with very high cardinality and one with very low cardinality (unique # of values), does it matter in which order I group by?

Here's an example:

select 
     dimensionName, 
     dimensionCategory, 
     sum(someFact)
from SomeFact f
join SomeDim d on f.dimensionKey = d.dimensionKey
group by 
    d.dimensionName,  -- large number of unique values
    d.dimensionCategory -- small number of unique values

Are there situations where it matters?

Sql Solutions


Solution 1 - Sql

No, the order doesn't matter for the GROUP BY clause.

MySQL and SQLite are the only databases I'm aware of that allow you to select columns which are omitted from the group by (non-standard, not portable) but the order doesn't matter there either.

Solution 2 - Sql

SQL is declarative.

In this case, you have told the optimiser how you want the data grouped and it works out how to do it.

It won't evaluate line by line (procedural) and look at one column first

The main place column order matters is for indexes. col1, col2 is not the same as col2, col1. At all.

Solution 3 - Sql

There's a legacy, non-standard feature of Microsoft SQL Server called ROLLUP. ROLLUP is an extension to the GROUP BY syntax and when it is used the order of the GROUP BY columns determines which columns should be grouped in the result. ROLLUP is deprecated however. The standard SQL alternative is to use grouping sets, which is supported by SQL Server 2008 and later versions.

Solution 4 - Sql

Since this has not been mentioned here. The answers above are correct i.e. the order of the columns after the "group by" clause will not affect the correctness of the query (i.e. the sum amount).

However, the order of the rows being retrieved will vary based on the order of the columns specified after the "group by" clause. For example consider Table A with the following rows:

Col1 Col2 Col3
1	xyz	100
2	abc	200
3	xyz	300
3	xyz	400

SELECT *, SUM(Col3) FROM A GROUP BY Col2, Col1 will retrieve rows ordered by the Col2 in ascending order.

Col1 Col2 Col3 sum(Col3)
2	abc	200	200
1	xyz	100	100
3	xyz	300	700

Now change the ordering of column in group by to Col1, Col2. The retrieved rows are ordered asc by Col1.

i.e. select *, sum(Col3) from A group by Col1, Col2

Col1 Col2 Col3 sum(Col3)
1	xyz	100	100
2	abc	200	200
3	xyz	300	700

Note: The the summation amount (i.e. the correctness of the query) remains exactly the same.

Solution 5 - Sql

If I have two columns, one with very high cardinality and one with very low cardinality (unique # of values), does it matter in which order I group by?

Query-1

SELECT spec_id, catid, spec_display_value, COUNT(*) AS cnt  FROM tbl_product_spec 
GROUP BY spec_id, catid, spec_display_value ;

Query-2

SELECT spec_id, catid, spec_display_value, COUNT(*) AS cnt  FROM tbl_product_spec FORCE INDEX(idx_comp_spec_cnt)
GROUP BY catid, spec_id,spec_display_value;

Both are equal , order doesn't work in group by clause.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionJeff Meatball YangView Question on Stackoverflow
Solution 1 - SqlOMG PoniesView Answer on Stackoverflow
Solution 2 - SqlgbnView Answer on Stackoverflow
Solution 3 - SqlnvogelView Answer on Stackoverflow
Solution 4 - SqlAaCodesView Answer on Stackoverflow
Solution 5 - SqlGauravkView Answer on Stackoverflow