Data Analytics
Analytics Reporting SQL SQL Server T-SQL

How to use OVER and PARTITION BY in SQL queries

Welcome to today’s post.

In today’s post I will be discussing the different analytical clauses that can be used within a T-SQL SELECT query. Many of us are familiar with using the ORDER, GROUP BY and HAVING clauses within a SELECT query to aggregate results within columns.

I will be explaining what each of the following clauses is:

ROW_NUMBER()

RANK()

OVER

PARTITION BY

And how they are used within a SELECT query.

I will also discuss how they compare to when we use the GROUP BY clause when grouping results in a SELECT query.

A Sample Dataset

To demonstrate the various clauses, I will provide a sample data set of Books and Loans from the fictitious book loan library.

The Book data is shown below:

SELECT [ID],
      	[Title],
      	[Author],
      	[Genre],
      	[YearPublished]
FROM 	[aspnet-BookCatalog].[dbo].[Books]

When selected the sample book records are as shown:

ID  Title                               Author            Genre    Year Published
1   The Lord of the Rings               J. R. R. Tolkien  Fantasy  1954
2   The Alchemist (O Alquimista)        Paulo Coelho      Fantasy  1988
3   The Little Prince (Le Petit         Antoine de        Fantasy  1943 
     Prince)                            Saint-Exupéry    
4   Grimms' Fairy Tales                 Jacob and Wilhelm Folklore 1812
    (Kinder- und Hausmärchen)           Grimm             	
5   Harry Potter and the                J. K. Rowling     Fantasy  1997 
    Philosopher's Stone
6   The Hobbit                          J. R. R. Tolkien  Fantasy  1937
7   And Then There Were None            Agatha Christie   Mystery  1939
8   Dream of the Red Chamber            Cao Xueqin        Family   1791
9   American Dirt                       Jeanine Cummins   Fiction  2020
10  Thief River Falls                   Brian Freeman     Fiction  2020
11  The Night Dragon                    Matthew Condon	  Fiction  2019

And the query for the Loan data for the above Books is:

SELECT [ID],
      	[BookID],
      	[DateLoaned]
FROM 	[aspnet-BookCatalog].[dbo].[Loans]

And the Loan data for the above Books is below:

ID  BookID	DateLoaned
1   5       2020-12-12 00:00:00.0000000
2   8       2020-12-12 00:00:00.0000000
3   11      2020-12-13 00:00:00.0000000
4   5       2020-12-12 00:00:00.0000000
5   9       2020-12-14 00:00:00.0000000
6   6       2020-12-15 00:00:00.0000000
7   3       2020-12-15 00:00:00.0000000
8   4       2020-12-26 00:00:00.0000000
9   2       2020-12-26 00:00:00.0000000
10  5       2020-12-12 00:00:00.0000000
11  1       2021-04-10 00:00:00.0000000
12  1       2021-04-21 11:23:25.1000000
13  10      0001-01-01 00:00:00.0000000
14  10      2021-07-10 00:01:57.4860000

In the next section, I will show how to apply basic grouping to the above dataset.

Grouping Data with GROUP BY

The first query shows the number of loans for each book. This is done by using the GROUP BY to aggregate the number of loans for each book title into a column. The results are ordered by commencing with the highest number of loans.

This is done as follows with the following query:

SELECT 	    b.Title, COUNT(l.BookID) As [Number of Loans]
FROM        Books AS b   
INNER JOIN  Loans AS l 
            ON b.ID = l.BookID
GROUP BY    b.Title
ORDER BY    COUNT(l.BookID) DESC

The data returned from the query is shown:

Title                                          Number of Loans
Harry Potter and the Philosopher's Stone       3
The Lord of the Rings                          2
Thief River Falls                              2
The Night Dragon                               1
American Dirt                                  1
Dream of the Red Chamber                       1
Grimms' Fairy Tales (Kinder- und Hausmärchen)  1
The Alchemist (O Alquimista)                   1
The Hobbit                                     1
The Little Prince (Le Petit Prince)            1

A variation of GROUP BY is the ROLLUP option that groups each combination of non-aggregated columns in the SELECT clause as a sub-total.

The SELECT below illustrates the ROLLUP:

SELECT          b.Genre, b.Location, COUNT(l.BookID) As [Number of Loans]
FROM            Books AS b   
INNER JOIN      Loans AS l   
                ON b.ID = l.BookID
GROUP BY ROLLUP (b.Genre, b.Location)

The resulting data returned from the query is:

Genre    Location  Number of Loans
family 	 Sydney    1
family   NULL      1
fantasy  Sydney    8
fantasy  NULL      8
Fiction  Lane Cove 3
Fiction  Sydney    1
Fiction  NULL      4
folklore Sydney    1
folklore NULL      1
NULL     NULL      14

Adding Row Identifiers with ROW_NUMBER() to a Result Set

One common feature of result sets is to include the numeric row of each record. In the first query we returned the books with the number of loans ordered in descending order by the number of loans. We can return the same result set and include the row number for each row in the result set, with the lowest row number commencing with the highest loan. The ROW_NUMBER() function is a window function that is followed by the OVER clause to specify a result set restriction. The ordering restriction of the ROW_NUMBER() is then:

ORDER BY COUNT(l.BookID) DESC

This is achieved with the following SELECT query:

SELECT      b.Title, COUNT(l.BookID) As [Number of Loans],
ROW_NUMBER() OVER (ORDER BY COUNT(l.BookID) DESC) AS Rank  
FROM        Books AS b   
INNER JOIN  Loans AS l   
            ON b.ID = l.BookID
GROUP BY    b.Title

The resulting data returned from the above query is:

Title                                         Number of Loans Rank
Harry Potter and the Philosopher's Stone      3               1
The Lord of the Rings                         2               2
Thief River Falls                             2               3
The Night Dragon                              1               4
American Dirt                                 1               5
Dream of the Red Chamber                      1               6
Grimms' Fairy Tales (Kinder- und Hausmärchen) 1			      7
The Alchemist (O Alquimista)                  1               8
The Hobbit                                    1               9
The Little Prince (Le Petit Prince)           1               10

The difference between the ROW_NUMBER() and RANK() window functions

Notice that the ROW_NUMBER() column in the above result does not give loans that are equal the same row ordering number. We can make use of the RANK() Window function to give each result row that has the same loan count the same rank.

The SELECT query below shows how this is achieved:

SELECT       b.Title, COUNT(l.BookID) As [Number Of Loans],
RANK() OVER (ORDER BY COUNT(l.BookID) DESC) AS Rank  
FROM         Books AS b   
INNER JOIN 	 Loans AS l   
             ON b.ID = l.BookID
GROUP BY     b.Title

The resulting data returned from the above query is:

Title                                          Number of Loans	Rank
Harry Potter and the Philosopher's Stone       3                1
The Lord of the Rings                          2                2
Thief River Falls                              2                2
The Night Dragon                               1                4
American Dirt                                  1                4
Dream of the Red Chamber                       1                4
Grimms' Fairy Tales (Kinder- und Hausmärchen)  1                4
The Alchemist (O Alquimista)                   1                4
The Hobbit                                     1                4
The Little Prince (Le Petit Prince)            1                4

We can also rank the number of loans by the book Genre as shown with the following query:

SELECT      b.Genre, 
COUNT(l.BookID) As [Number of Loans],
RANK() OVER (ORDER BY COUNT(l.BookID) DESC) AS Rank  
FROM        Books AS b   
INNER JOIN  Loans AS l   
            ON b.ID = l.BookID
GROUP BY    b.Genre

The resulting data returned from the above query is:

Genre       Number of Loans	Rank
Fantasy     8               1
Fiction     4               2
Folklore    1               3
Family      1               3

We can also order the number of loans by the book Genre as shown with the following query:

SELECT      b.Genre, COUNT(l.BookID) As [Number Of Loans],
            ROW_NUMBER() OVER (ORDER BY COUNT(l.BookID) DESC) AS Rank  
FROM        Books AS b   
INNER JOIN  Loans AS l   
            ON b.ID = l.BookID
GROUP BY    b.Genre

The resulting data returned from the above query is:

Genre       Number of Loans	Rank
Fantasy     8               1
Fiction     4               2
Folklore    1               3
Family      1               4

Using the PARTITION BY record set restriction of a window function

With the ROW_NUMBER() and RANK() window functions we used to provide numbering for result ranking, we can further restrict the window function within a partition within the number of loans column. Within the Book table, we can use the Genre column to partition the ranking of loans within each genre. The restriction can be applied to the window function as shown:

PARTITION BY b.Genre
ORDER BY COUNT(l.BookID)

The SELECT query below shows how this is achieved:

SELECT     b.Title, b.Genre, COUNT(l.BookID) As [Number Of Loans],
           RANK() OVER (
               PARTITION BY b.Genre 
               ORDER BY COUNT(l.BookID) DESC
           ) AS Rank  
FROM       Books AS b   
INNER JOIN Loans AS l   
           ON b.ID = l.BookID
GROUP BY   b.Title, b.Genre

The resulting data returned from the above query is:

Title                                         Genre     Number of Loans  Rank
Dream of the Red Chamber                      Family    1                1
Harry Potter and the Philosopher's Stone      Fantasy   3                1
The Lord of the Rings                         Fantasy   2                2
The Alchemist (O Alquimista)                  Fantasy   1                3
The Hobbit                                    Fantasy   1                3
The Little Prince (Le Petit Prince)           Fantasy   1                3
Thief River Falls                             Fiction   2                1
American Dirt                                 Fiction   1                2
The Night Dragon                              Fiction   1                2
Grimms' Fairy Tales (Kinder- und Hausmärchen) Folklore  1                1

In the above SELECT queries, the use of GROUP BY is an aggregation, and the use of PARTITION BY is analytic. The GROUP BY rolls up the rows, then applies an aggregate function for each row, then returns the result in the aggregated row.

The OVER clause is a window of rows, whereby a window function such as ROW_NUMBER() or RANK() is restricted by an argument, such as ORDER BY or PARTITION BY that orders or partitions the records within the result set.

For performance reasons, we can use GROUP BY to rollup results into fewer rows.

For analysis and reporting, we can use the OVER and PARTITION BY clauses to further restrict and reorder subsets of results before they are output in the result set.

The most useful part of the ROW_NUMBER() and RANK() functions with the PARTITION BY clause is to provide numeric positioning, ranking and aggregation of columns of data in a grouped window of rows which is useful in many application reporting scenarios where such ordering is required.

That is all for today’s post.

I hope you have found this post useful and informative.

Social media & sharing icons powered by UltimatelySocial