CUBE and ROLLUP with GROUPBY in SQL Server

CUBE and ROLLUP are SQL Server operators which are always used along with a GROUP BY clause. These operators perform multi level aggregations at each column specified in the GROUP BY Clause. ROLLUP will do aggregation in the following manner:

Operation : ROLLUP(col1,col2)
Groupings : col1,col2
            col1, ALL(NULL)
            ALL (NULL)

CUBE will do aggregation in the following manner:

Operation : CUBE(col1,col2)
Groupings : col1,col2
            col1
        ALL(NULL), col2
            ALL(NULL)

Let us create the following dataset and explore the resultset:

create table #sales_data(region varchar(30), sales_date datetime, 
sales_amount decimal(12,2))

truncate table #sales_data
insert into #sales_data(region,sales_Date,sales_amount)
select 'South Asia', '2014-01-01',30000 union all
select 'South Asia', '2014-05-01',72000 union all
select 'South Asia', '2015-05-01',6700 union all
select 'North America', '2014-03-01',12500 union all
select 'North America', '2015-05-01',80000 union all
select 'North America', '2015-05-01',9000 union all
select 'Australia', '2014-02-01',88000 union all
select 'Australia', '2015-01-01',144000 union all
select 'Australia', '2015-05-01',178000 

Using ROLLUP

select region, year(sales_date) as sales_year, 
sum(sales_amount) as total_spent  from #sales_data
group by region, year(sales_date)
with ROLLUP

The result of the above statement is as follows:

region                         sales_year  total_spent
------------------------------ ----------- ---------------
Australia                      2014        88000.00
Australia                      2015        322000.00
Australia                      NULL        410000.00
North America                  2014        12500.00
North America                  2015        89000.00
North America                  NULL        101500.00
South Asia                     2014        102000.00
South Asia                     2015        6700.00
South Asia                     NULL        108700.00
NULL                           NULL        620200.00

As you can see, Grouping is done in three cases.

1. Grouping by the columns region and sales_year
2. Grouping by the columns region only
3. Grouping by no columns

Using CUBE

select region, year(sales_date) as sales_year, 
sum(sales_amount) as total_spent  from #sales_data
group by region, year(sales_date)
with CUBE

The result of the above statement is as follows:

region                         sales_year  total_spent
------------------------------ ----------- -----------------
Australia                      2014        88000.00
North America                  2014        12500.00
South Asia                     2014        102000.00
NULL                           2014        202500.00
Australia                      2015        322000.00
North America                  2015        89000.00
South Asia                     2015        6700.00
NULL                           2015        417700.00
NULL                           NULL        620200.00
Australia                      NULL        410000.00
North America                  NULL        101500.00
South Asia                     NULL        108700.00

As you can see, Grouping is done in four cases.

1. Grouping by the columns region and sales_year
2. Grouping by the column sales_year only
3. Grouping by the column region
4. Grouping by no columns

You can use both of these operators to achieve aggregation at multiple levels. CUBE performs more number of aggregations with multiple combination of the grouping columns.

If you just want to perform multiple groupings not excluding the first column, you can use ROLLUP, otherwise use CUBE.

Note: Usage of ROLLUP and CUBE in SQL Server are time-consuming options as they do groupings at multiple level. If you use a reporting tool to display the result, you should avoid doing it in SQL and do it in the reporting tool itself .


About The Author

Madhivanan,an MSc computer Science graduate from Chennai-India, works as a works as a Lead Subject Matter Expert at a company that simplifies BIG data. He started his career as a developer working with Visual Basic 6.0, SQL Server 2000 and Crystal Report 8. As years went by, he started working more on writing queries in SQL Server. He now has good level of knowledge in SQLServer, Oracle, MySQL and PostgreSQL as well. He is also one of the leading posters at www.sqlteam.com and a moderator at www.sql-server-performance.com. His T-sql blog is at http://beyondrelational.com/blogs/madhivanan

No comments: