Welcome to today’s post.
In today’s post I will be discussing the different analytical clauses that can be used within a T-SQL SELECT query. Many of us are familiar with using the ORDER, GROUP BY and HAVING clauses within a SELECT query to aggregate results within columns.
I will be explaining what each of the following clauses is:
ROW_NUMBER()
RANK()
OVER
PARTITION BY
And how they are used within a SELECT query.
I will also discuss how they compare to when we use the GROUP BY clause when grouping results in a SELECT query.
A Sample Dataset
To demonstrate the various clauses, I will provide a sample data set of Books and Loans from the fictitious book loan library.
The Book data is shown below:
SELECT [ID],
[Title],
[Author],
[Genre],
[YearPublished]
FROM [aspnet-BookCatalog].[dbo].[Books]
When selected the sample book records are as shown:
ID Title Author Genre Year Published
1 The Lord of the Rings J. R. R. Tolkien Fantasy 1954
2 The Alchemist (O Alquimista) Paulo Coelho Fantasy 1988
3 The Little Prince (Le Petit Antoine de Fantasy 1943
Prince) Saint-Exupéry
4 Grimms' Fairy Tales Jacob and Wilhelm Folklore 1812
(Kinder- und Hausmärchen) Grimm
5 Harry Potter and the J. K. Rowling Fantasy 1997
Philosopher's Stone
6 The Hobbit J. R. R. Tolkien Fantasy 1937
7 And Then There Were None Agatha Christie Mystery 1939
8 Dream of the Red Chamber Cao Xueqin Family 1791
9 American Dirt Jeanine Cummins Fiction 2020
10 Thief River Falls Brian Freeman Fiction 2020
11 The Night Dragon Matthew Condon Fiction 2019
And the query for the Loan data for the above Books is:
SELECT [ID],
[BookID],
[DateLoaned]
FROM [aspnet-BookCatalog].[dbo].[Loans]
And the Loan data for the above Books is below:
ID BookID DateLoaned
1 5 2020-12-12 00:00:00.0000000
2 8 2020-12-12 00:00:00.0000000
3 11 2020-12-13 00:00:00.0000000
4 5 2020-12-12 00:00:00.0000000
5 9 2020-12-14 00:00:00.0000000
6 6 2020-12-15 00:00:00.0000000
7 3 2020-12-15 00:00:00.0000000
8 4 2020-12-26 00:00:00.0000000
9 2 2020-12-26 00:00:00.0000000
10 5 2020-12-12 00:00:00.0000000
11 1 2021-04-10 00:00:00.0000000
12 1 2021-04-21 11:23:25.1000000
13 10 0001-01-01 00:00:00.0000000
14 10 2021-07-10 00:01:57.4860000
In the next section, I will show how to apply basic grouping to the above dataset.
Grouping Data with GROUP BY
The first query shows the number of loans for each book. This is done by using the GROUP BY to aggregate the number of loans for each book title into a column. The results are ordered by commencing with the highest number of loans.
This is done as follows with the following query:
SELECT b.Title, COUNT(l.BookID) As [Number of Loans]
FROM Books AS b
INNER JOIN Loans AS l
ON b.ID = l.BookID
GROUP BY b.Title
ORDER BY COUNT(l.BookID) DESC
The data returned from the query is shown:
Title Number of Loans
Harry Potter and the Philosopher's Stone 3
The Lord of the Rings 2
Thief River Falls 2
The Night Dragon 1
American Dirt 1
Dream of the Red Chamber 1
Grimms' Fairy Tales (Kinder- und Hausmärchen) 1
The Alchemist (O Alquimista) 1
The Hobbit 1
The Little Prince (Le Petit Prince) 1
A variation of GROUP BY is the ROLLUP option that groups each combination of non-aggregated columns in the SELECT clause as a sub-total.
The SELECT below illustrates the ROLLUP:
SELECT b.Genre, b.Location, COUNT(l.BookID) As [Number of Loans]
FROM Books AS b
INNER JOIN Loans AS l
ON b.ID = l.BookID
GROUP BY ROLLUP (b.Genre, b.Location)
The resulting data returned from the query is:
Genre Location Number of Loans
family Sydney 1
family NULL 1
fantasy Sydney 8
fantasy NULL 8
Fiction Lane Cove 3
Fiction Sydney 1
Fiction NULL 4
folklore Sydney 1
folklore NULL 1
NULL NULL 14
Adding Row Identifiers with ROW_NUMBER() to a Result Set
One common feature of result sets is to include the numeric row of each record. In the first query we returned the books with the number of loans ordered in descending order by the number of loans. We can return the same result set and include the row number for each row in the result set, with the lowest row number commencing with the highest loan. The ROW_NUMBER() function is a window function that is followed by the OVER clause to specify a result set restriction. The ordering restriction of the ROW_NUMBER() is then:
ORDER BY COUNT(l.BookID) DESC
This is achieved with the following SELECT query:
SELECT b.Title, COUNT(l.BookID) As [Number of Loans],
ROW_NUMBER() OVER (ORDER BY COUNT(l.BookID) DESC) AS Rank
FROM Books AS b
INNER JOIN Loans AS l
ON b.ID = l.BookID
GROUP BY b.Title
The resulting data returned from the above query is:
Title Number of Loans Rank
Harry Potter and the Philosopher's Stone 3 1
The Lord of the Rings 2 2
Thief River Falls 2 3
The Night Dragon 1 4
American Dirt 1 5
Dream of the Red Chamber 1 6
Grimms' Fairy Tales (Kinder- und Hausmärchen) 1 7
The Alchemist (O Alquimista) 1 8
The Hobbit 1 9
The Little Prince (Le Petit Prince) 1 10
The difference between the ROW_NUMBER() and RANK() window functions
Notice that the ROW_NUMBER() column in the above result does not give loans that are equal the same row ordering number. We can make use of the RANK() Window function to give each result row that has the same loan count the same rank.
The SELECT query below shows how this is achieved:
SELECT b.Title, COUNT(l.BookID) As [Number Of Loans],
RANK() OVER (ORDER BY COUNT(l.BookID) DESC) AS Rank
FROM Books AS b
INNER JOIN Loans AS l
ON b.ID = l.BookID
GROUP BY b.Title
The resulting data returned from the above query is:
Title Number of Loans Rank
Harry Potter and the Philosopher's Stone 3 1
The Lord of the Rings 2 2
Thief River Falls 2 2
The Night Dragon 1 4
American Dirt 1 4
Dream of the Red Chamber 1 4
Grimms' Fairy Tales (Kinder- und Hausmärchen) 1 4
The Alchemist (O Alquimista) 1 4
The Hobbit 1 4
The Little Prince (Le Petit Prince) 1 4
We can also rank the number of loans by the book Genre as shown with the following query:
SELECT b.Genre,
COUNT(l.BookID) As [Number of Loans],
RANK() OVER (ORDER BY COUNT(l.BookID) DESC) AS Rank
FROM Books AS b
INNER JOIN Loans AS l
ON b.ID = l.BookID
GROUP BY b.Genre
The resulting data returned from the above query is:
Genre Number of Loans Rank
Fantasy 8 1
Fiction 4 2
Folklore 1 3
Family 1 3
We can also order the number of loans by the book Genre as shown with the following query:
SELECT b.Genre, COUNT(l.BookID) As [Number Of Loans],
ROW_NUMBER() OVER (ORDER BY COUNT(l.BookID) DESC) AS Rank
FROM Books AS b
INNER JOIN Loans AS l
ON b.ID = l.BookID
GROUP BY b.Genre
The resulting data returned from the above query is:
Genre Number of Loans Rank
Fantasy 8 1
Fiction 4 2
Folklore 1 3
Family 1 4
Using the PARTITION BY record set restriction of a window function
With the ROW_NUMBER() and RANK() window functions we used to provide numbering for result ranking, we can further restrict the window function within a partition within the number of loans column. Within the Book table, we can use the Genre column to partition the ranking of loans within each genre. The restriction can be applied to the window function as shown:
PARTITION BY b.Genre
ORDER BY COUNT(l.BookID)
The SELECT query below shows how this is achieved:
SELECT b.Title, b.Genre, COUNT(l.BookID) As [Number Of Loans],
RANK() OVER (
PARTITION BY b.Genre
ORDER BY COUNT(l.BookID) DESC
) AS Rank
FROM Books AS b
INNER JOIN Loans AS l
ON b.ID = l.BookID
GROUP BY b.Title, b.Genre
The resulting data returned from the above query is:
Title Genre Number of Loans Rank
Dream of the Red Chamber Family 1 1
Harry Potter and the Philosopher's Stone Fantasy 3 1
The Lord of the Rings Fantasy 2 2
The Alchemist (O Alquimista) Fantasy 1 3
The Hobbit Fantasy 1 3
The Little Prince (Le Petit Prince) Fantasy 1 3
Thief River Falls Fiction 2 1
American Dirt Fiction 1 2
The Night Dragon Fiction 1 2
Grimms' Fairy Tales (Kinder- und Hausmärchen) Folklore 1 1
In the above SELECT queries, the use of GROUP BY is an aggregation, and the use of PARTITION BY is analytic. The GROUP BY rolls up the rows, then applies an aggregate function for each row, then returns the result in the aggregated row.
The OVER clause is a window of rows, whereby a window function such as ROW_NUMBER() or RANK() is restricted by an argument, such as ORDER BY or PARTITION BY that orders or partitions the records within the result set.
For performance reasons, we can use GROUP BY to rollup results into fewer rows.
For analysis and reporting, we can use the OVER and PARTITION BY clauses to further restrict and reorder subsets of results before they are output in the result set.
The most useful part of the ROW_NUMBER() and RANK() functions with the PARTITION BY clause is to provide numeric positioning, ranking and aggregation of columns of data in a grouped window of rows which is useful in many application reporting scenarios where such ordering is required.
That is all for today’s post.
I hope you have found this post useful and informative.
Andrew Halil is a blogger, author and software developer with expertise of many areas in the information technology industry including full-stack web and native cloud based development, test driven development and Devops.