Welcome to today’s blog.
In today’s post I will be showing how to use the GroupBy() extension method to group data within tables using the Entity Framework Core ORM.
I will be showing how to use data grouping with a few different methods using LINQ queries, and to apply grouping in the most efficient ways possible to achieve the same outcome.
When we want to group a table of Loan records by the BookID we could use a LINQ to SQL query to produce the grouped records.
This is done as shown:
var getGroupedLoanBooks = (from loan in _db.Loans
group loan by loan.BookID into grouping
select grouping);
Using the GroupBy() Extension Method
In Entity Framework Core we can make use of grouping by using the GroupBy() extension method to generate a list of tuples of type IGrouping<TKey, TElement>, where TKey of the key of each element, and TElement is the element being selected.
The basic overload is as shown:
var groupedLoansByBook = loans.GroupBy(
book => book.BookID,
books => books.BookID,
(book, books) => new
{
Key = book,
Count = books.Count()
});
Where:
the following key selector obtains the key from the BookID field:
book => book.BookID,
the following element selector obtains the element from the same BookID field:
books => books.BookID,
the following result selector obtains a result value from each group consisting of the key and the count of the books elements that have the matching key:
(book, books) => new
{
Key = book,
Count = books.Count()
});
With our EF data context, we can apply the grouping as shown:
var groupedLoansByBook = _db.Loans.GroupBy(
book => book.BookID,
books => books.BookID,
(book, books) => new
{
Key = book,
Count = books.Count()
});
Note: You should avoid doing the following:
var loans = await _db.Loans.ToListAsync();
var groupedLoansByBook = loans.GroupBy(…);
as the Loans table, will in cases where it has many records will give an inefficient query.
The following should also be avoided as it will result in an open data reader error:
var groupedLoansByBook = _db.Loans.GroupBy(…);
foreach (var result in groupedLoansByBook)
{
int bookID = Convert.ToInt32(result.Key.ToString());
var book = _db.Books.Where(b => b.ID == bookID).SingleOrDefault();
…
}
The error you will get from the SQL Client library is shown below:
System.InvalidOperationException: There is already an open DataReader associated with this Command which must be closed first.
at Microsoft.Data.SqlClient.SqlInternalConnectionTds.ValidateConnectionForExecute(SqlCommand command)
at Microsoft.Data.SqlClient.SqlConnection.ValidateConnectionForExecute(String method, SqlCommand command)
at Microsoft.Data.SqlClient.SqlCommand.ValidateCommand(Boolean isAsync, String method)
In the next section, I will show how to execute queries on multiple tables without incurring the data reader error for the synchronous operations.
Executing LINQ Queries with Multiple Data Reader Connections
Since the LINQ GroupBy() method is not asynchronous (with anonymous typed results) the loans group query and book retrieval:
var book = _db.Books.Where(b => b.ID == bookID).SingleOrDefault();
will need to be executed in different reader connections. The for loop above on the grouped results still has a connection to the database. To release the database connection the group query has we need to call the ToList() extension method to force the group query to load the result set into the client memory.
This can be achieved in the following way:
foreach (var result in groupedLoansByBook.ToList())
{
int bookID = Convert.ToInt32(result.Key.ToString());
var book = _db.Books.Where(b => b.ID == bookID).SingleOrDefault();
...
}
The group query results will now be able to be processed within our loop. After we generate the resulting tuples of grouped elements, the next step we can transform each grouped result value into a structure that has both the book details and the loan count for the book:
var loanRanking = new List<TopBooksLoanedDetailReportViewModel>();
foreach (var result in groupedLoansByBook)
{
int bookID = Convert.ToInt32(result.Key.ToString());
var book = _db.Books.Where(b => b.ID == bookID).SingleOrDefault();
if (book != null)
loanRanking.Add(
new Models.ReportViewModels.TopBooksLoanedDetailReportViewModel()
{
title = book.Title,
author = book.Author,
count = result.Count
}
);
}
Where our class for storing the results is shown below:
public class TopBooksLoanedDetailReportViewModel
{
public int ranking { get; set; }
public int count { get; set; }
public string title { get; set; }
public string author { get; set; }
}
We simply matched our key:
result.Key
from each grouped result item, retrieved the matching Book record, then populated the loanRanking list with some of the book details (title, author) with the count of the total loans for the book.
When the above is run we can see how the grouping query generates our key and element tuple list as shown:
We can see it is unordered. After we process the grouped records in the foreach loop
Following from this, we can order the resulting records by the highest loan count in descending order so that the highest number of loans is output first.
This is achieved as shown:
loanRanking = loanRanking.OrderByDescending(c => c.count).ToList();
We then compute the ranking for each book loan record with the record with the highest number of loans the highest ranking (1=highest). We do this as shown:
var ranking = 1;
loanRanking.ForEach(r =>
{
r.ranking = ranking;
ranking++;
});
return loanRanking;
The output of our ordered books by highest loans is shown:
[
{
"ranking": 1,
"count": 3,
"title": "Harry Potter and the Philosopher's Gold",
"author": "J. R. Bowling"
},
{
"ranking": 2,
"count": 2,
"title": "The Lord of the Rings",
"author": "J. R. S. Tolken"
},
{
"ranking": 3,
"count": 1,
"title": "Dream of the Blue Chamber",
"author": "Tao Xueqin"
},
{
"ranking": 4,
"count": 1,
"title": "The Night Dragon",
"author": "Matthew Crouton"
}
….
]
As we can see, the GroupBy() LINQ extension method can be applied to our Entity Framework Core data context to grouping a table of data to produce useful outputs that can be utilized in Web API calls and also for report queries.
That is all for today’s post.
I hope you found this post useful and informative.
Andrew Halil is a blogger, author and software developer with expertise of many areas in the information technology industry including full-stack web and native cloud based development, test driven development and Devops.