Data Analytics
Analytics SQL SQL Server T-SQL

How to use PIVOT to aggregate column data in SQL queries

Welcome to today’s post.

In today’s post I will explain what a pivot query is in SQL Server T-SQL and how it can be used to solve some data queries that would otherwise be more challenging to extract in application development.

In a previous post I showed how to use the OVER and PARTITION BY clauses in a SELECT SQL query to perform aggregated operations over a partitioned group of results, which are ordered by a window function (ROW(), RANK()).

Instead of partitioning and aggregating data, we will look as how to pivot columns of data in a query.

What is the SQL PIVOT Operator?

A PIVOT operator is used to transform a result that is table-valued into a column-valued result set.

What this means is that a table result like the following:

Column1 Column2 … ColumnN
Value11 Value12 … Value1N
…       …       … …
ValueM1 ValueM2	… ValueMN

The above Is then transformed into a column valued result like the following:

Column1        Pivot-Column-1 Pivot-Column-2 … Pivot-Column-N
Column-Value1  Value1         Value2           ValueN
…              …              …              … …
Column-ValueM1 Value1         Value2         … ValueN

Where

Value[i] in the columns is the aggregation of data grouped by the column value Column1-Value[i].

To get from the table-valued result to the column-valued result I will show an example of a table-valued result from a SELECT … GROUP BY query.

We start with some sample data:

SELECT 	[ID],
     	[ProductName]
      	[SaleAmount]
      	[SaleDate]
FROM	[ProductSales]

The results show as:

ID	ProductName	SaleAmount	SaleDate
1	Apples      100         2021-01-01
2	Oranges     200         2021-01-15
3	Bananas     150         2021-02-01
4	Apples      200         2021-02-15
5	Pears       100         2021-02-15
6	Apples      150         2021-03-01
7	Bananas	    100         2021-03-15
8	Apples      250         2021-04-01
9	Oranges	    50          2021-04-15
10	Apples      300         2021-05-01
11	Pears       100         2021-05-15
12	Bananas     150         2021-05-15
13	Apples      150         2021-06-01
14	Pears       150         2021-06-15

In the next section, I will show how to turn a row of columnar data into a cross tab report.

Rotating Row Data into a Cross Tab Report

Suppose that I want to create a report that rotates the above result set into a cross tab column-valued report where the aggregations are the column values of the result, and the groupings of the of the aggregations are by the column names.

Essentially, the column names in the pivot table will be buckets that we allocate each record amount to during the grouping of data.

The more familiar table valued method to aggregate the above data is to use a familiar GROUP BY select query as shown below where we can aggregate sale amounts by the product name:

SELECT    [ProductName], Sales=SUM([SaleAmount])
FROM      [ProductSales]
GROUP BY  [ProductName]

Or aggregate sales amount by each month of the year a product has a sale:

SELECT    [ProductName], Month=MONTH([SaleDate]), Sales=SUM([SaleAmount])
FROM      [ProductSales]
GROUP BY  [ProductName], MONTH([SaleDate])

The transform we have in mind is the following is a 6-month columnar report:

ProductName  Jan Feb Mar Apr May Jun
[Product 1]  v11 v12 v13 v14 v15 v16
[Product 2]  v21 v22 v23 v24 v25 v26
…
[Product N]  vN1 vN2 vN3 vN4 vN5 vN6

Where:

[Product1] … [ProductN] are the products that will be aggregated, and

and

v[i][j] are the totals allocated to the pivot columns.

In the next section, I will show how we can use pivots with a number of useful summary and quantity reports.

Example Pivot Reports

In this section, I will cover how to generate some useful reports based on the Pivot operator.

Sales Summary Totals by Month

This then leads us to a sales summary monthly report that aggregates sales by the product, then partitions these aggregated totals into each month. The SQL query we use is shown below:

SELECT    [ProductName], Jan=[1], Feb=[2], Mar=[3], Apr=[4], May=[5], Jun=[6]
FROM
(
    SELECT    [ProductName], 
              Month=MONTH([SaleDate]), 
              Sales=SUM([SaleAmount])
    FROM      [ProductSales]
    GROUP BY  [ProductName], MONTH([SaleDate])
) As DataSource
PIVOT (SUM([Sales]) FOR [Month] IN ([1], [2], [3], [4], [5], [6])) pvt

The result is a column-valued report as follows:

ProductName  Jan   Feb  Mar  Apr  May  Jun
Apples		 100   200  150  250  300  150
Bananas      NULL  150  100  NULL 150  NULL
Oranges      200   NULL NULL 50   NULL NULL
Pears        NULL  100  NULL NULL 100  150

You can see that where there are no records that correspond to a sale for that given month, the aggregated total for that month is given a NULL total.

The above report is for sales totals that are for a product within each month.

What is we want sales totals that are cumulative? A cumulative total is when we accumulate the aggregated totals from the first month to the current month in the report column.

So, if I sold 100 apples in January, 200 in February, and 300 in March, then the cumulative total for February is 300 and for March it is 600. To visualize this, we can first write a basic table-valued GROUP BY query that shows the cumulative totals for each month in each row. This can be done with the following GROUP BY query:

SELECT   t.[ProductName], 
         Month=MONTH(t.[SaleDate]), 
         Sales= 
         (
            SELECT SUM([SaleAmount])
			FROM   [ProductSales]
			WHERE  [ProductName] = t.[ProductName] AND
                   MONTH([SaleDate]) <= Month(t.[SaleDate])
         )
FROM     [ProductSales] t
GROUP BY [ProductName], MONTH(t.[SaleDate])

The resulting table-valued result is as follows:

ProductName  Month  Sales
Apples       1      100
Apples       2      300
Apples       3      450
Apples       4      700
Apples       5      1000
Apples       6      1150
Bananas      2      150
Bananas      3      250
Bananas      5      400
Oranges      1      200
Oranges      4      250
Pears        2      100
Pears        5      200
Pears        6      350

The above doesn’t look too elegant, however it shows the month 1..6 and it’s accumulative total.

Sales Totals 6 Month Cumulative Summary

Now let’s improve on the above report with a column-valued cross tab report using the SQL PIVOT operator:

SELECT     [ProductName], 
           Jan=[1], Feb=[2], Mar=[3], Apr=[4], May=[5], Jun=[6]
FROM
(
      SELECT t.[ProductName], 
             Month=MONTH(t.[SaleDate]), 
             Sales= 
             (
                 SELECT SUM([SaleAmount])
                 FROM [ProductSales]
                 WHERE [ProductName] = t.[ProductName] AND 
                 MONTH([SaleDate]) <= Month(t.[SaleDate])
             )
       FROM [ProductSales] t
       GROUP BY [ProductName], MONTH(t.[SaleDate])
)
AS DataSource
PIVOT (SUM([Sales]) FOR [Month] IN ([1], [2], [3], [4], [5], [6])) pvt

Notice that we have taken the GROUP BY query and used it as a sub-query for the PIVOT query:

SELECT  [ProductName], 
        Jan=[1], Feb=[2], Mar=[3], Apr=[4], May=[5], Jun=[6]
FROM
(
    [GROUP BY sub-query]
)
AS DataSource
PIVOT (SUM([Sales]) FOR [Month] IN ([1], [2], [3], [4], [5], [6])) pvt

The resulting cumulative totals report is shown:

ProductName Jan  Feb  Mar  Apr  May  Jun
Apples      100  300  450  700  1000 1150
Bananas     NULL 150  250  NULL 400  NULL
Oranges     200  NULL NULL 250  NULL NULL
Pears       NULL 100  NULL NULL 200  350

Turning the above into a 12-month cumulative sales summary report is then quite straightforward.

We have some sample data that covers a 12-month duration:

12 Months Sales Data …

ID  ProductName SaleAmount	SaleDate
1   Apples      100         2021-01-01
2   Oranges     200         2021-01-15
3   Bananas     150         2021-02-01
4   Apples      200         2021-02-15
5   Pears       100         2021-02-15
6   Apples      150         2021-03-01
7   Bananas     100         2021-03-15
8   Apples      250         2021-04-01
9   Oranges     50          2021-04-15
10  Apples      300         2021-05-01
11  Pears       100         2021-05-15
12  Bananas     150         2021-05-15
13  Apples      150         2021-06-01
14  Pears       150         2021-06-15
15  Apples      50          2021-07-01
16  Oranges     100         2021-07-01
17  Pears       50          2021-07-15
18  Apples      100         2021-08-01
19  Bananas     200         2021-08-15
20  Oranges     350         2021-09-01
21  Pears       200         2021-09-15
22  Bananas     50          2021-09-15
23  Apples      400         2021-10-01
24  Oranges     150         2021-10-01
25  Pears       250         2021-11-01
26  Bananas     100         2021-11-01
27  Apples      250         2021-11-15
28  Apples      150         2021-12-01
29  Pears       100         2021-12-01
30  Oranges     450         2021-12-15

Sales Totals 12 Month Cumulative Summary

We will work some magic with the PIVOT operator to produce a cumulative summary report:

SELECT   [ProductName], 
         Jan=[1], 
         Feb=[2], 
         Mar=[3], 
         Apr=[4], 
         May=[5], 
         Jun=[6],
         Jul=[7],
         Aug=[8],
         Sep=[9],
         Oct=[10],
         Nov=[11],
         Dec=[12]
FROM
(
         SELECT t.[ProductName], Month=MONTH(t.[SaleDate]), 
                Sales= 
                (
                    SELECT SUM([SaleAmount])
                    FROM   [ProductSales]
                    WHERE  [ProductName] = t.[ProductName] AND
                           MONTH([SaleDate]) <= Month(t.[SaleDate])
                )
FROM     [ProductSales] t
GROUP BY [ProductName], MONTH(t.[SaleDate])
)
AS DataSource
PIVOT (SUM([Sales]) FOR [Month] 
IN ([1], [2], [3], [4], [5], [6], [7], [8], [9], [10], [11], [12])) pvt

What we just did was to add an additional column Sales, to the report, summed up by the product, then pivoted the sales aggregation by the month of the product sale.

The resulting 12-month cumulative totals report is shown:

ProductName Jan  Feb  Mar  Apr  May  Jun  Jul  Aug  Sep  Oct  Nov  Dec
Apples      100  300  450  700  1000 1150 1200 1300 NULL 1700 1950 2100
Bananas     NULL 150  250  NULL 400  NULL NULL 600  650  NULL 750  NULL
Oranges     200  NULL NULL 250  NULL NULL 350  NULL 700  850  NULL 1300
Pears       NULL 100  NULL NULL 200  350  400  NULL 600  NULL 850  950

An alternative report can be a count (quantity) of the total sales instead of the sales amount totals.

Sales Count by Month

The sales count by month is generated with the following GROUP BY query:

SELECT   Sales_Count=COUNT([SaleAmount]), 
         Month=MONTH([SaleDate])
FROM     ProductSales t
GROUP BY MONTH([SaleDate])

The table-valued result is then:

Sales_Count Month
2           1
3           2
2           3
2           4
3           5
2           6
3           7
2           8
3           9
2           10
3           11
3           12

Cumulative Sales Count by Month

We can now write the above query as a cumulative sales count for each month:

SELECT Sales_Count=
(
       SELECT  COUNT(s.[SaleAmount])  
       FROM    [ProductSales] s
		HERE   MONTH(s.[SaleDate]) <= MONTH(t.[SaleDate])
), 
Month=MONTH([SaleDate])
FROM    ProductSales t
GROUP BY MONTH([SaleDate])

The cumulative sales count by month is generated with the following GROUP BY query:

Sales_Count	Month
2           1
5           2
7           3
9           4
12          5
14          6
17          7
19          8
22          9
24          10
27          11
30          12

Quarterly Sales Quantity Report by Product with Grouping

What if we wanted a report that aggregated the sales counts by each quarter?

We can start with a basic GROUP BY query.

SELECT [ProductName],
(
    SELECT  COUNT(s.[SaleAmount])  
    FROM    [ProductSales] s
    WHERE   s.ProductName = t.ProductName AND 
            MONTH(s.[SaleDate]) >= 1 AND MONTH(s.[SaleDate]) <= 3
) AS Q1, 
(
    SELECT  COUNT(s.[SaleAmount])  
    FROM    [ProductSales] s
    WHERE   s.ProductName = t.ProductName AND 
            MONTH(s.[SaleDate]) IN (4, 5, 6)
) AS Q2,
(
    SELECT  COUNT(s.[SaleAmount])  
    FROM    [ProductSales] s
    WHERE   s.ProductName = t.ProductName AND 
            MONTH(s.[SaleDate]) IN (7, 8, 9)
) AS Q3,
(
    SELECT  COUNT(s.[SaleAmount])  
    FROM    [ProductSales] s
    WHERE   s.ProductName = t.ProductName AND 
            MONTH(s.[SaleDate]) IN (10, 11, 12)
) AS Q4
FROM     ProductSales t
WHERE    YEAR(t.[SaleDate]) = 2021
GROUP BY [ProductName]

The table-valued result is then:

ProductName Q1  Q2  Q3  Q4
Apples      3   3   2   3
Bananas     2   1   2   1
Oranges     1   1   2   2
Pears       1   2   2   2

Quarterly Sales Amount Report by Product with Grouping

We can also use the above to generate a report that aggregated the sales amounts by the quarter.

SELECT [ProductName],
(
    SELECT SUM(s.[SaleAmount])  
    FROM   [ProductSales] s
    WHERE  s.ProductName = t.ProductName AND MONTH(s.[SaleDate]) >= 1 AND 
           MONTH(s.[SaleDate]) <= 3
) AS Q1, 
(
    SELECT SUM(s.[SaleAmount])  
    FROM   [ProductSales] s
    WHERE  s.ProductName = t.ProductName AND MONTH(s.[SaleDate]) IN (4, 5, 6)
) AS Q2,
(
    SELECT SUM(s.[SaleAmount])  
    FROM   [ProductSales] s
    WHERE  s.ProductName = t.ProductName AND MONTH(s.[SaleDate]) IN (7, 8, 9)
) AS Q3,
(
    SELECT SUM(s.[SaleAmount])  
    FROM   [ProductSales] s
    WHERE  s.ProductName = t.ProductName AND MONTH(s.[SaleDate]) IN (10, 11, 12)
) AS Q4
FROM ProductSales t
WHERE YEAR(t.[SaleDate]) = 2021
GROUP BY [ProductName]

The table-valued result is then:

ProductName Q1  Q2  Q3  Q4
Apples      450 700 150 800
Bananas     250 150 250 100
Oranges     200 50  450 600
Pears       100 250 250 350

Quarterly Sales Amount Report by Product with Pivot

The next step is to write the above quarterly sales amount report using the PIVOT operator.

SELECT  [ProductName], 
        Q1=[1], Q2=[2], Q3=[3], Q3=[4]
FROM
(
        SELECT [ProductName], Quarter=1, 
               Sales=SUM([SaleAmount])
        FROM   [ProductSales]
GROUP BY [ProductName], MONTH([SaleDate])
HAVING (MONTH([SaleDate]) = 1) OR 
       (MONTH([SaleDate]) = 2) OR 
       (MONTH([SaleDate]) = 3)
UNION
SELECT   [ProductName], Quarter=2, Sales=SUM([SaleAmount])
FROM     [ProductSales]
GROUP BY [ProductName], MONTH([SaleDate])
HAVING   (MONTH([SaleDate]) = 4) OR 
         (MONTH([SaleDate]) = 5) OR 
         (MONTH([SaleDate]) = 6)
UNION
SELECT   [ProductName], Quarter=3, Sales=SUM([SaleAmount])
FROM     [ProductSales]
GROUP BY [ProductName], MONTH([SaleDate])
HAVING   (MONTH([SaleDate]) = 7) OR 
         (MONTH([SaleDate]) = 8) OR 
         (MONTH([SaleDate]) = 9)
UNION
SELECT   [ProductName], Quarter=4, Sales=SUM([SaleAmount])
FROM     [ProductSales]
GROUP BY [ProductName], MONTH([SaleDate])
HAVING   (MONTH([SaleDate]) = 10) OR 
         (MONTH([SaleDate]) = 11) OR 
         (MONTH([SaleDate]) = 12)
)
As DataSource
PIVOT
(SUM([Sales]) FOR [Quarter] IN ([1], [2], [3], [4])) pvt

The column-valued result is then:

ProductName Q1  Q2  Q3  Q3
Apples      450 700 150 800
Bananas     250 150 250 100
Oranges     200 50  450 600
Pears       100 250 250 350

As you can see, we have managed to implement using a combination of the familiar GROUP BY and PIVOT operators, some elegant query results that can be utilised within many applications that need to group aggregations of data into columnar buckets. Without the advantage in being able to make use of the PIVOT operator, we would have to resort to utilising some additional scripting within a stored procedure or within our application business logic to generate the pivot columns.

The above queries could be used in useful scenarios like generating reports with the totals of different types of users such as using a particular service or site utilization in particular areas on a given month of the year.

For more details refer to the PIVOT operator on the Microsoft site.

That is all for today’s post.

I hope you have found this post useful and informative.

Social media & sharing icons powered by UltimatelySocial