Understanding the differences between CUBE and ROLLUP

SqlSql ServerSql Server-2008Tsql

Sql Problem Overview


My assignment asked me to find out "how many invoices are written for each date?"

I was a little stuck and asked my professor for help. She emailed me a query that would answer the question, "How many stoves of each type and version have been built? For a challenge but no extra points, include the total number of stoves."

This was the query she sent me:

SELECT STOVE.Type + STOVE.Version AS 'Type+Version'
, COUNT(*) AS 'The Count'
FROM STOVE
GROUP BY STOVE.Type + STOVE.Version WITH ROLLUP;

So, I tweaked that query until it met my needs. This is what I came up with:

SELECT InvoiceDt
, COUNT(InvoiceNbr) AS 'Number of Invoices' 
FROM INVOICE 
GROUP BY InvoiceDt WITH ROLLUP 
ORDER BY InvoiceDt ASC;

And it returned the following results that I wanted.

Anyway, I decided to read up on the ROLLUP clause and started with an article from Microsoft. It said that the ROLLUP clause was similar to the CUBE clause but that it was distinguished from the CUBE clause in the following way:

  1. CUBE generates a result set that shows aggregates for all combinations of values in the selected columns.
  2. ROLLUP generates a result set that shows aggregates for a hierarchy of values in the selected columns.

So, I decided to replace the ROLLUP in my query with CUBE to see what would happen. They produced the same results. I guess that's where I'm getting confused.

It seems like, if you're using the type of query that I am here, that there isn't any practical difference between the two clauses. Is that right? Or, am I not understanding something? I had thought, when I finished reading the Microsoft article, that my results should've been different using the CUBE clause.

Sql Solutions


Solution 1 - Sql

You won't see any difference since you're only rolling up a single column. Consider an example where we do

ROLLUP (YEAR, MONTH, DAY)

With a ROLLUP, it will have the following outputs:

YEAR, MONTH, DAY
YEAR, MONTH
YEAR
()

With CUBE, it will have the following:

YEAR, MONTH, DAY
YEAR, MONTH
YEAR, DAY
YEAR
MONTH, DAY
MONTH
DAY
()

CUBE essentially contains every possible rollup scenario for each node whereas ROLLUP will keep the hierarchy in tact (so it won't skip MONTH and show YEAR/DAY, whereas CUBE will)

This is why you didn't see a difference since you only had a single column you were rolling up.

Hope that helps.

Solution 2 - Sql

We can understand the difference between ROLLUP and CUBE with a simple example. Consider we have a table which contains the results of quarterly test of students. In certain cases we need to see the total corresponding to the quarter as well as the students. Here is the sample table

SELECT * INTO #TEMP
FROM
(
    SELECT 'Quarter 1' PERIOD,'Amar' NAME ,97 MARKS
    UNION ALL
    SELECT 'Quarter 1','Ram',88 
    UNION ALL
    SELECT 'Quarter 1','Simi',76 
    UNION ALL
    SELECT 'Quarter 2','Amar',94 
    UNION ALL
    SELECT 'Quarter 2','Ram',82 
    UNION ALL
    SELECT 'Quarter 2','Simi',71 
    UNION ALL
    SELECT 'Quarter 3' ,'Amar',95 
    UNION ALL
    SELECT 'Quarter 3','Ram',83 
    UNION ALL
    SELECT 'Quarter 3','Simi',77
    UNION ALL
    SELECT 'Quarter 4' ,'Amar',91 
    UNION ALL
    SELECT 'Quarter 4','Ram',84 
    UNION ALL
    SELECT 'Quarter 4','Simi',79
)TAB

enter image description here

1. ROLLUP(Can find total for corresponding to one column)

(a) Get total score of each student in all quarters.

SELECT * FROM #TEMP
UNION ALL
SELECT PERIOD,NAME,SUM(MARKS) TOTAL
FROM #TEMP
GROUP BY NAME,PERIOD 
WITH ROLLUP
HAVING PERIOD IS NULL AND NAME IS NOT NULL 
// Having is used inorder to emit a row that is the total of all totals of each student

Following is the result of (a)

enter image description here

(b) Incase you need to get total score of each quarter

SELECT * FROM #TEMP
UNION ALL
SELECT PERIOD,NAME,SUM(MARKS) TOTAL
FROM #TEMP
GROUP BY PERIOD,NAME 
WITH ROLLUP
HAVING PERIOD IS NOT NULL AND NAME IS NULL

Following is the result of (b)

enter image description here

2. CUBE(Find total for Quarter as well as students in a single shot)

SELECT PERIOD,NAME,SUM(MARKS) TOTAL
FROM #TEMP
GROUP BY NAME,PERIOD 
WITH CUBE
HAVING PERIOD IS NOT NULL OR NAME IS NOT NULL

Following is the result of CUBE

enter image description here

Now you may be wondering about the real time use of ROLLUP and CUBE. Sometimes we need a report in which we need to see the total of each quarter and total of each student in a single shot. Here is an example

I am changing the above CUBE query slightly as we need total for both totals.

SELECT CASE WHEN PERIOD IS NULL THEN 'TOTAL' ELSE PERIOD END PERIOD,
CASE WHEN NAME IS NULL THEN 'TOTAL' ELSE NAME END NAME,
SUM(MARKS) MARKS
INTO #TEMP2
FROM #TEMP
GROUP BY NAME,PERIOD 
WITH CUBE

DECLARE @cols NVARCHAR (MAX)

SELECT @cols = COALESCE (@cols + ',[' + PERIOD + ']', 
               '[' + PERIOD + ']')
               FROM    (SELECT DISTINCT PERIOD FROM #TEMP2) PV  
               ORDER BY PERIOD    

               
DECLARE @query NVARCHAR(MAX)
SET @query = 'SELECT * FROM 
             (
                 SELECT * FROM #TEMP2
             ) x
             PIVOT 
             (
                 SUM(MARKS)
                 FOR [PERIOD] IN (' + @cols + ')
            ) p;' 

EXEC SP_EXECUTESQL @query

Now you will get the following result

enter image description here

Solution 3 - Sql

This is because you only have one column that you are grouping by.

Add Group by InvoiceDt, InvoiceCountry (or whatever field will give you more data.

With Cube will give you a Sum for each InvoiceDt and you will get a Sum for each InvoiceCountry.

Solution 4 - Sql

You can find more detail about GROUPING SET, CUBE, ROLL UP. TL;DR they just expand GROUP BY + UNION ALL in some ways to get aggregation :)

https://technet.microsoft.com/en-us/library/bb510427(v=sql.105).aspx

Solution 5 - Sql

All Voted answers are good.


One Important difference in general is

  1. The N elements of a ROLLUP specification correspond to N+1 GROUPING SETS.
  2. The N elements of a CUBE specification correspond to 2^N GROUPING SETS.

Further reading see my article with respect to spark sql

For example :

store_id,product_type

rollup is equivalent to

GROUP BY store_id,product_type
 GROUPING SETS (
(store_id,product_type)
,(product_type)
, ())

for 2 (n) group by columns grouping set has (n+1 ) = 3 combinations of columns

Cube is equivalent to

GROUP BY store_id,product_type
 GROUPING SETS (
(store_id,product_type)
,(store_id)
,(product_type)
, ())

for 2 (n) group by columns grouping set has (2^n ) = 4 combinations of columns

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionkrebshackView Question on Stackoverflow
Solution 1 - SqlDerek KrommView Answer on Stackoverflow
Solution 2 - SqlSarath KSView Answer on Stackoverflow
Solution 3 - SqlJeffOView Answer on Stackoverflow
Solution 4 - Sqlo0omycomputero0oView Answer on Stackoverflow
Solution 5 - SqlRam GhadiyaramView Answer on Stackoverflow