Five levels of performance tuning for an EF Core query

Last Updated: March 4, 2021 | Created: February 23, 2021

This is a companion article to the EF Core Community Standup called “Performance tuning an EF Core app” where I apply a series of performance enhancements to a demo ASP.NET Core e-commerce book selling site called the Book App. I start with 700 books, then 100,000 books and finally ½ million books.

This article, plus the EF Core Community Standup video, pulls information from chapters 14 to 16 from my book “Entity Framework Core in Action, 2nd edition” and uses code in the associated GitHub repo https://github.com/JonPSmith/EfCoreinAction-SecondEdition.

NOTE: You can download the code and run the application described in this article/video via the https://github.com/JonPSmith/EfCoreinAction-SecondEdition GitHub repo. Select the Part3 branch and run the project called BookApp.UI. The home page of the Book App has information on how to change the Book App’s settings for chapter 15 (four SQL versions) and chapter 16 (Cosmos DB).

Other articles that are relevant to the performance tuning shown in this article

TL;DR – summary

  • The demo e-commerce book selling site displays books with various sort, filter and paging that you might expect to need. One of the hardest of the queries is to sort the book by their average votes (think Amazon’s star ratings).
  • At 700 books a well-designed LINQ query is all you need.
  • At 100,000 books (and ½ million reviews) LINQ on its own isn’t good enough. I add three new ways to handle the book display, each one improving performance, but also takes more development effort.
  • At ½ million books (and 2.7 million reviews) SQL on its own has some serious problems, so I swap to a Command Query Responsibility Segregation (CQRS) architecture, with the read-side using a Cosmos DB database (Cosmos DB is a NOSQL database)
  • The use of Cosmos DB with EF Core highlights
    • How Cosmos DB is different from a relational (SQL) database
    • The limitations in EF Core’s Cosmos DB database provider
  • At the end I give my view of performance gain against development time.

The Book App and its features

The Book App is a demo e-commerce site that sells books. In my book “Entity Framework Core in Action, 2nd edition” I use this Book App as an example of using various EF Core features. It starts out with about 50 books in it, but in Part3 of the book I spend three chapters on performance tuning and take the number of books up to 100,000 book and then to ½ million books. Here is a screenshot of the Book App running in “Chapter 15” mode, where it shows four different modes of querying a SQL Server database.

The Book App query which I improve has the following Sort, Filter, Page features

  • Sort: Price, Publication Date, Average votes, and primary key (default)
  • Filter: By Votes (1+, 2+, 3+, 4+), By year published, By tag, (defaults to no filter)
  • Paging: Num books shown (default 100) and page num

Note: that a book can be soft deleted, which means there is always an extra filter on the books shown.

The book part of the database (the part of the database that handles orders isn’t shown) looks like this.

First level of performance tuning – Good LINQ

One way to load a Book with its relationships is by using Includes (see code below)

var books = context.Books
    .Include(book => book.AuthorsLink
        .OrderBy(bookAuthor => bookAuthor.Order)) 
            .ThenInclude(bookAuthor => bookAuthor.Author)
    .Include(book => book.Reviews)
    .Include(book => book.Tags)
    .ToList();

By that isn’t the best way to load books if you want good performance. That’s because a) you are loading a lot of data that you don’t need and b) you would need to do sorting and filter in software, which is slow. So here are my five rules for building fast, read-only queries.

  1. Don’t load data you don’t need, e.g.  Use Select method pick out what is needed.
    See lines 18 to 24 of my MapBookToDto class.
  2. Don’t Include relationships but pick out what you need from the relationships.
    See lines 25 to 30 of my MapBookToDto class.
  3. If possible, move calculations into the database.
    See lines 13 to 34 of my MapBookToDto class.
  4. Add SQL indexes to any property you sort or filter on.
    See the configuration of the Book entity.
  5. Add AsNoTracking method to your query (or don’t load any entity classes).
    See line 29 in ListBookService class

NOTE: Rule 3 is the hardest to get right. Just remember that some SQL commands, like Average (SQL AVE) can return null if there are no entries, which needs a cast to a nullable type to make it work.

So, combining the Select, Sort, Filter and paging my code looks like this.

public async Task<IQueryable<BookListDto>> SortFilterPageAsync
    (SortFilterPageOptions options)
{
    var booksQuery = _context.Books 
        .AsNoTracking() 
        .MapBookToDto() 
        .OrderBooksBy(options.OrderByOptions) 
        .FilterBooksBy(options.FilterBy, options.FilterValue); 

    await options.SetupRestOfDtoAsync(booksQuery); 

    return booksQuery.Page(options.PageNum - 1, 
        options.PageSize); 
}

Using these rules will start you off with a good LINQ query, which is a great starting point. The next sections are what to do if that doesn’t’ give you the performance you want.

When the five rules aren’t enough

The query above is going to work well when there aren’t many books, but in chapter 15 I create a database containing 100,000 books with 540,000 reviews. At this point the “five rules” version has some performance problems and I create three new approaches, each of which a) improves performance and b) take development effort. Here is a list of the four approaches, with the Good LINQ version as our base performance version.

  1. Good LINQ: This uses the “five rules” approach. We compare all the other version to this query.
  2. SQL (+UDFs): This combines LINQ with SQL UDFs (user-defined functions) to move concatenations of Author’s Names and Tags into the database.
  3. SQL (Dapper): This creates the required SQL commands and then uses the Micro-ORM Dapper to execute that SQL to read the data.
  4. SQL (+caching): This pre-calculates some of the costly query parts, like the averages of the Review’s NumStars (referred to as votes).

In the video I describe how I build each of these queries and the performance for the hardest query, this is sort by review votes.

NOTE: The SQL (+caching) version is very complex, and I skipped over how I built it, but I have an article called “A technique for building high-performance databases with EF Core” which describes how I did this. Also, chapter 15 on my book “Entity Framework Core in Action, 2nd edition” covers this too.

Here is a chart in the I showed in the video which provides performances timings for three queries from the hardest (sort by votes) down to a simple query (sort by date).

The other chart I showed was a breakdown of the parts of the simple query, sort by date. I wanted to show this to point out that Dapper (which is a micro-ORM) is only significantly faster than EF Core if you have better SQL then EF Core produces.

Once you have a performance problem just taking a few milliseconds off isn’t going to be enough – typically you need cut its time by at least 33% and often more. Therefore, using Dapper to shave off a few milliseconds over EF Core isn’t worth the development time. So, my advice is and study the SQL that EF Core creates and if you know away to improve the SQL, then Dapper is a good solution.

Going bigger – how to handle ½ million or more books

In chapter 16 I build what is called a Command Query Responsibility Segregation (CQRS) architecture. The CQRS architecture acknowledges that the read side of an application is different from the write side. Reads are often complicated, drawing in data from multiple places, whereas in many applications (but not all) the write side can be simpler, and less onerous. This is true in the Book App.

To build my CQRS system I decided to make the read-side live in a different database to the write-side of the CQRS architecture, which allowed me to use a Cosmos DB for my read-side database. I did this because Cosmos DB designed for performance (speed of queries) and scalability (how many requests it can handle). The figure below shows this two-database CQRS system.

The key point is the data saved in the Cosmos DB has as many of the calculations as possible pre-calculated, rather like the SQL (+cached) version – that’s what the projection stage does when a Book or its associated relationships are updated.

If you want to find out how to build a two-database CQRS code using Cosmos DB then my article Building a robust CQRS database with EF Core and Cosmos DB describes one way, while chapter 16 on my book provides another way using events.

Limitations using Cosmos DB with EF Core

It was very interesting to work with Cosmos DB with EF Core as there were two parts to deal with

  • Cosmos DB is a NoSQL database and works differently to a SQL database (read this Microsoft article for one view)
  • The EF Core 5 Cosmos DB database provider has many limitations.

I had already look at these two parts back in 2019 and written an article, which I have updated to EF Core 5, and renamed it to “An in-depth study of Cosmos DB and the EF Core 3 to 5 database provider”.

Some of the issues I encountered, listed with the issues that made the biggest change to my Book App are:

  • EF Core 5 limitation: Counting the number of books in Cosmos DB is SLOW!
  • EF Core 5 limitation: EF Core 5 cannot do subqueries on a Cosmos DB database.
  • EF Core 5 limitation: No relationships or joins.
  • Cosmos difference: Complex queries might need breaking up
  • EF Core 5 limitation: Many database functions not implemented.
  • Cosmos difference: Complex queries might need breaking up.
  • Cosmos difference: Skip is slow and expensive.
  • Cosmos difference: By default, all properties are indexed.

I’m not going to go though all of these – the “An in-depth study of Cosmos DB and the EF Core 3 to 5 database provider” covers most of these.

Because of the EF Core limitation on counting books, I changed the way that that paging works. Instead of you picking what page you want you have a Next/Prev approach, like Amazon uses (see figure after list of query approaches). And to allow a balanced performance comparison with the SQL version and the Cosmos DB version I added the best two SQL approaches, but turned of counting too (SQL is slow on that).

It also turns out that Cosmos DB can count very fast so I built another way to query Cosmos DB using its NET (pseudo) SQL API. With this the Book App had four query approaches.

  1. Cosmos (EF): This accesses the Cosmos DB database using EF Core (with some parts using the SQL database where EF Core didn’t have the features to implement parts of the query.
  2. Cosmos (Direct): This uses Cosmos DB’s NET SQL API and I wrote raw commands – bit like using Dapper for SQL.
  3. SQL (+cacheNC): This uses the SQL cache approach using the 100,000 books version, but with counting turned off to compare with Cosmos (EF).
  4. SQL (DapperNC): This uses Dapper, which has the best SQL performance, but with counting turned off to compare with Cosmos (EF).

The following figure shows the Book App in CQRS/Cosmos DB mode with the four query approaches, and the Prev/Next paging approach.

Performance if the CQRS/Cosmos DB version

To test the performance, I used an Azure SQL Server and Cosmos DB service from a local Azure site in London. To compare the SQL performance and the Cosmos DB performance I used databases with a similar cost (and low enough it didn’t cost me too much!). The table below shows what I used.

Database typeAzure service namePerformance unitsPrice/month
Azure SQL ServerStandard20 DTUs$37
Cosmos DBPay-as-you-gomanual scale, 800 RUs$47

I did performance tests on the Cosmos DB queries while I was adding books to the database to see if the size of the database effected performance. Its hard to get a good test of this as there is quite a bit of variation in the timings.

The chart below compares EF Core calling Cosmos DB, referred to as Cosmos (EF), against using direct Cosmos DB commands via its NET SQL API – referred to as Cosmos (Direct).

This chart (and other timing I took) tells me two things:

  • The increase in the number in the database doesn’t make much effect on the performance (the Cosmos (Direct) 250,000 is well within the variation)
  • Counting the Books costs ~25 ms, which is much better than the SQL count, which added about ~150 ms.

The important performance test was to look at Cosmos DB against the best of our SQL accesses. I picked a cross-section of sorting and filtering queries and run them on all four query approaches – see the chart below.

From the timings in the figure about here some conclusions.

  1. Even the best SQL version, SQL (DapperNC), doesn’t work in this application because any sort or filter on the Reviews took so long that the connection timed out at 30 seconds.
  2. The SQL (+cacheNC) version was at parity or better with Cosmos DB (EF) on the first two queries, but as the query got more complex it fell behind in performance.
  3. The Cosmos DB (direct), with its book count, was ~25% slower than the Cosmos DB (EF) with no count but is twice as fast as the SQL count versions.

Of course, there are some downsides of the CQRS/Cosmos DB approach.

  • The add and update of a book to the Cosmos DB takes a bit longer: this is because the CQRS requires four database accesses (two to update the SQL database and two to update the Cosmos database) – that adds up to about 110 ms, which is more than double the time a single SQL database would take. There are ways around this (see this part of my article about CQRS/Cosmos DB) but it takes more work.
  • Cosmos DB takes longer and costs more if you skip items in its database. This shouldn’t be a problem with the Book App as many people would give up after a few pages, but if your application needs deep skipping through data, then Cosmos DB is not a good fit.

Even with the downsides I still think CQRS/Cosmos DB is a good solution, especially when I add in the fact that implementing this CQRS was easier and quicker than building the original SQL (+cache) version. Also, the Cosmos concurrency handling is easier than the SQL (+cache) version.

NOTE: What I didn’t test is Cosmos DB’s scalability or the ability to have multiple versions of the Cosmos DB around the work. Mainly because it’s hard to do and it costs (more) money.

Performance against development effort

In the end it’s a trade-off of a) performance gain and b) development time. I have tried to summarise this in the following table, giving a number from 1 to 9 for difficultly (Diff? in table) and performance (Perf? In the table).

The other thing to consider is how much more complexity does your performance tuning add to your application. Badly implemented performance tuning can make an application harder to enhance and extend. That is one reason why use like the event approach I used on the SQL (+cache) and CQRS / Cosmos DB approaches because it makes the least changes to the existing code.

Conclusion

As a freelance developer/architect I have had to performance tune many queries, and sometimes writes, on real applications. That’s not because EF Core is bad at performance, but because real-world application has a lot of data and lots of relationships (often hierarchical) and it takes some extra work to get the performance the client needs.

I have already used a variation of the SQL (+cache) on a client’s app to improve the performance of their “has the warehouse got all the parts for this job?”. And I wish Cosmos DB was around when I built a multi-tenant service that needed to cover the whole of the USA.

Hopefully something in this article and video will be useful if (when!) you need performance tune your application.

NOTE: You might like to look at the article “My experience of using modular monolith and DDD architectures” and its companion article to look at the architectural approaches I used on the Part3 Book App. I found the Modular Monolith architectural approach really nice.

I am a freelance developer who wrote the book “Entity Framework Core in Action“. If you need help performance tuning an EF Core application I am available for work. If you want hire me please contact me to discuss your needs.

My experience of using the Clean Architecture with a Modular Monolith

Last Updated: March 18, 2021 | Created: February 11, 2021

In this article I look at my use of a clean architecture with the modular monolith architecture covered in the first article. Like the first article this isn’t primer on Clean Architecture and modular monolith but is more about how I adapted the Clean Architecture to provide the vertical separation of the features in the modular monolith application.

  1. My experience of using modular monolith and DDD architectures.
  2. My experience of using the Clean Architecture with a Modular Monolith (this article).

Like the first article I’m going to give you my impression of the good and bad parts of the Clean Architecture, plus a look at whether the time pressure of the project (which was about 5 weeks later) made me “break” any rules.

TL;DR – summary

  • The Clean Architecture is like the traditional layered architecture, but with a series of rules that improve the layering.
  • I build an application using ASP.NET Core and EF Core using the Clean Architecture with the modular monolith approach. After this application was finished, I analysed how each approach had worked under time pressure.
  • I had used the Clean Architecture once before on a client’s project, but not with the modular monolith approach.
  • While the modular monolith approach had the biggest effect on the application’s structure without the Clean Architecture layers the code would not be so good.
  • I give you my views of the good, bad and possible “cracks under time pressure” for the Clean Architecture.
  • Overall I think the Clean Architecture adds some useful rules to the traditional layered architecture, but I had to break one of those rules you make it work with EF Core.

A summary of the Clean Architecture

NOTE: I don’t describe the modular monolith in this article because I did that in the first article. Here is a link to the modular monolith intro in the first article.

The Clean Architecture approach (also called the Hexagonal Architecture and Onion Architecture) is an development of the traditional “N-Layer” architecture (shortened to layered architecture). The Clean Architectureapproach talks about “onion layers” wrapped around each other and has the following main rules:

  1. The business classes (typically the classes mapped to a database) are in the inner-most layer of the “onion”.
  2. The inner-most layer of the onion should not have no significant external code e.g., NuGet packages, added to it. This is designed to keep the business logic as clean and simple as possible.
  3. Only the outer layer can access anything outside of the code. That means:
    1. The code that users access, e.g. ASP.NET Core, is in the outer layer
    1. Any external services, like the database, email sending etc. is in the outer layer.
  4. Code in inner layers can’t reference any outer layers.

The combination of rules 3 and 4 could cause lots of issues as lower layers will need to access external services. This is handled by adding interfaces to the inner-most layer of the onion and registering the external services using dependency injection (DI).

The figure below shows how I applied the Clean Architecture to my application, with is an e-commerce web site selling book, called the Book App.

NOTE: I detail the modification that I make to Clean Architecture approach around the persistence (database) layer later in the article.

Links to more detailed information on Clean Architecture (unmodified)

Setting the scene – the application and the time pressure

In 2020 I was updating my book “Entity Framework Core in Action” I build an ASP.NET Core application that sells books called Book App. In the early chapters is very simple, as I am describing the basics of EF Core, but in the last section I build a much more complex Book App that I progressively performance tuned, starting with 700 books, then 100,000 books and finally ½ million books. For the Book App to perform well it through three significant enhancement stages.  Here is example of Book App features and display with four different ways to display the books to compare their performance.

At the same time, I was falling behind on getting the book finished. I had planned to finish all the chapters by the end of November 2020 when EF Core 5 was due out. But I only started the enhanced Book App in August 2020 so with 6 chapters to write I was NOT going to finish the book in November. So, the push was on to get things done! (In the end I finished writing the book just before Christmas 2020).

My experience of using Clean Architecture with a Modular Monolith

I had used a simpler Clean Architecture on a client’s project I worked on, so I had some ideas of what I would do. Clean Architecture was useful, but its just another layered architecture with more rules and I had to break one of its key rules to make it work with EF Core. Overall I think I would use my modified Clean Architecture again in a larger application.

A. What was good about Clean Architecture?

To explain how Clean Architecture helps we need to talk about the main architecture – the modular monolith goals. The modular monolith focuses on features (Kamil Grzybek called them modules). One way to work would have one project per feature, but that has some problems.

  • The project would be more complex, as it has everything inside it.
  • You could end up with duplicating some code.

The Separation of Concerns (SoC) principal says breaking up a feature parts that focus on one part of the feature is a better way to go. So, the combination of modular monolith and using layers provides a better solution. The figure below shows two modular monolith features running vertically, and the five Clean Architecture layers running horizontally. The figure has a lot in it, but it’s there to show:

  • Reduce complexity: A feature can be split up into projects spread across the Clean Architecture layers, thus making the feature easier to understand, test and refactor.
  • Removing duplication: Breaking up the features into layer stops duplication – feature 1 and 2 share the Domain and Persistence layers.

The importance of the Service Layer in the Clean Architecture layers

Many years ago, I was introduced to the concept of the Service Layer. There are many definitions of the Service Layer (try this definition), but for me it’s a layer that knows about the lower / inner layer data structures and knows about the front-end data structures and it can adapt between the two structures (see LHS of the diagram above). So, the Service Layer isolates lower layers from having to know how the front-end works.

For me a Service Layer is a very important level.

  • It holds all the business logic or database accessed that the front-end needs, normally providing as services. This makes it much easier to unit test these services.
  • It takes on the job of adapting data to / from the front end. This means this layer that has to care about the two different data structures.

NOTE: Some of my libraries, like EfCore.GenericServices and EfCore.GenericBizRunner are designed to work as a Service Layer type service i.e., both libraries adapts between the lower / inner layer data structures to the front-end data structures.

Thus, the infrastructure layer, which is just below the Service Layer, contains for services that are still working in the entity class view. In the Book App these projects contained code to seed the database, handle logging and providing event handling. While services in the Service Layer worked with both lower / inner layer data structures and front-end data structures.

To end the “good” part of Clean Architecture I should say that a layered architecture could also provide the layering that the Clean Architecture defined. It’s just that the v has some more rules, most of which are useful.

B. Clean Architecture – what was bad?

The main problem was fitting the EF Core DbContext into the Clean Architecture. Clean Architecture says that the database should be on the outer ring, with interfaces for the access. The problem is there is no simple interface that you can use for the application’s DbContext. Even if you using a repository pattern (which I don’t, and here is why) then you have a problem that the application’s DbContext has to be define deep in the onion.

My solution was to put the EF Core right after to the inner circle (name Domain) holding the entity classes – I called that layer persistence, as that’s what DDD calls it. That breaks one of the key rules of the Clean Architecture, but other than that it works fine. But other external services, such as an external email service, I would follow the Clean Architecture rules and add an interface in the inner (Domain) circle and register the service using DI.

Clean Architecture – how did it fair under time pressure?

Appling the Clean Architecture and Modular Monolith architectures together took a little more time to think thought (I covered this in this section of the first article), but the end result was very good (I explain that in this section of the first article). The Clean Architecture layers broke a modular monolith feature into different parts, thus making the feature easier to understand and removing duplicate code.

The one small part of the clean architecture approach I didn’t like, but I stuck to, is that the Domain layer shouldn’t have any significant external packages, for instance a NuGet library, added to it. Overall, I applaud this rule as it keeps the Domain entities clean, but it did mean I had to do more work when configuring the EF Core code, e.g. I couldn’t use EF Core’s [Owned] attribute on entity classes. In a larger application I might break that rule.

So, I didn’t break any Clean Architecture rules because of the time pressure. The only rules I changed were make it work with EF Core, but I might break the “Domain layer and no significant external packages” in the future.

Conclusion

I don’t think the Clean Architecture approach has as big of effect on the structure that the modular monolith did (read the first article), but Clean Architecture certainly added to the structure by breaking modular monolith features into smaller, focused projects. The combination of the two approaches gave a really good structure.

My question is: does the Clean Architecture provide good improvements over a traditional layered architecture, especially as I had to break one of its key rules to work with EF Core? My answer is that using the Clean Architecture approach has made me a bit more aware of how I organise my layers, for instance I now have an infrastructure layer that I didn’t have before, and I appreciate that.

Please feel free to comment on what I have written about. I’m sure there are lots of people who have more experience with the Clean Architecture than me, so you can give your experience too.

Happy coding.

My experience of using modular monolith and DDD architectures

Last Updated: March 18, 2021 | Created: February 8, 2021

This article is about my experience of using a Modular Monolith architecture and Domain-Driven Design (DDD) approach on a small, but complex application using ASP.NET Core and EF Core. This isn’t a primer on modular monolith or DDD (but there is a good summary of each with links) but gives my views of the good and bad aspects of each approach.

  1. My experience of using modular monolith and DDD architectures (this article).
  2. My experience of using the Clean Architecture with a Modular Monolith.

I’m also interested in architecture approaches that help me to build applications with a good structure, even when I’m under some time pressure to finish – I want to have rules and patterns than help me to “do the right thing” even when I am under pressure. I’m about to explore this aspect because the I was privileged (?!) to be working on a project that was late😊.

TL;DR – summary

  • The Modular Monolith architecture breaks up the code into independent modules (C# projects) for each of the features needed in your application. Each module only link to other modules that specifically provides services it needs.
  • Domain-Driven Design (DDD) is a holistic approach to understanding, designing and building software applications.
  • I build an application using ASP.NET Core and EF Core using both of these architectural approaches. After this application was finished, I analysed how each approach had worked under time pressure.
  • It was the first time I had used the Modular Monolith architecture, but it came out with flying colours. The code structure consists of 22 projects, and each (other than the ASP.NET Core front-end) are focused on one specific part of the application’s features.
  • I have used DDD a lot and, as I expected, it worked really well in the application. The classes (called entities by DDD) have meaningful named methods to update the data, so it is easy to understand.
  • I also give you my views of the good, bad and possible “cracks” that each approach has.

The architectural approaches covered in this article

At the start of building my application I knew the application would go through three major changes. I therefore wanted architectural approaches that makes it easier to enhance the application. Here are the main approaches I will be talking about:

  1. Modular Monolith: building independent modules for each feature.
  2. DDD: Better separation and control of business logic

NOTE: I used a number of other architectures and patterns that I am not going to describe. They are layered architecture, domain events / integration events, and CQRS database pattern to name but a few.

1. Modular Monolith

A Modular Monolith is an approach where you build and deploy a single application (that’s the “Monolith” part), but you build application in a way that breaks up the code into independent modules for each of the features needed in your application. This approach reduces the dependencies of a module in such as way that you can enhance/change a module without it effecting other modules. The figure below shows you 9 projects that implement two different features in the system. Notice they have only one common (small) project.

The benefits of a Modular Monolith approach over a normal Monolith are:

  • Reduced complexity: Each module only links to code that it specifically needs.
  • Easier to refactor: Changing a module has less or no effect on other modules.
  • Better for teams: Easier for developers to work on different parts of the code.

Links to more detailed information on modular monoliths

NOTE: An alternative to a monolith is to go to a Microservice architecture. Microservice architecture allow lots of developers to work in parallel, but there are lots of issues around communications between each Microservice – read this Microsoft document comparing a Monolith vs. Microservice architecture. The good news is a Modular Monolith architecture is much easier to convert to a Microservice architecture because it is already modular.

3. Domain-Driven Design (DDD)

DDD is holistic approach to understanding, designing and building software application. It comes from the book Domain-Driven Design by Eric Evans. Because DDD covers so many areas I’m only going to talk about the DDD-styled classes mapped to the database (referred to as entity classes in this article) and a quick coverage of bounded contexts.

DDD says your entity classes must control of the data that it contains; therefore, all the properties are read-only with constructors / methods used to create or change the data in an entity class. That way the entity class’s constructors / methods can ensure the create / update follows the business rules for this entity.

NOTE: The above paragraph is super-simplification of what DDD says about entity classes and there is so much more to say. If you are new to DDD then google “DDD entity” and your software language, e.g., C#.

Another DDD term is bounded contexts which is about separating your application into separate parts with very controlled interaction between different bounded context. For example, my application is an e-commerce web site selling book, and I could see that the display of books was different to the customer ordering some books. There is shared data (like the product number and the price the book is sold for) but other than that they are separate.

The figure below shows how I separated the displaying of book and the ordering of books at the database level.

Using DDD’s bounded context technique, the Book bounded context can change its data (other than the product code and sale price) without it effecting the Orders bounded context, and the Orders code can’t change anything in the Books bounded context.

The benefits of DDD over a non-DDD approach are:

  • Protected business rules: The entity classes methods contain most of the business logic.
  • More obvious: entity classes containing methods with meaningful named to call.
  • Reduces complexity: Bounded context breaks an app into separate parts.

Links to more detailed information on DDD

Setting the scene – the application and the time pressure

In 2020 I was updating my book “Entity Framework Core in Action” and my example application was an ASP.NET Core application that sells books called the Book App. In the early chapters the Book App is very simple, as I am describing the basics of EF Core. But in the last section I build a much more complex Book App that I progressively performance tuned, starting with 700 books, then 100,000 books and finally ½ million books. For the Book App to perform well it through three significant enhancement stages.  Here is example of Book App features and display with four different ways to display the books to compare their performance.

At the same time, I was falling behind on getting the book finished. I had planned to finish all the chapters by the end of November 2020 when EF Core 5 was due out. But I only started the enhanced Book App in August 2020 so with 6 chapters to write I was NOT going to finish the book in November. So, the push was on to get things done! (In the end I finished writing the book just before Christmas 2020).

NOTE: The ASP.NET Core application I talk about in this article is available on GitHub. It is in branch Part3 of the repo https://github.com/JonPSmith/EfCoreinAction-SecondEdition and can be run locally.

My experience of each architectural approach

As well as experiencing each architectural approach while upgrade the application (what Neal Ford calls evolutionary architecture) I also had the extra experience of building the Book App under a serious time pressure. This type of situation is a great for learning whether the approaches worked or not, which is what I am now going to describe. But first here is a summary for you:

  1. Modular Monolith = Brilliant. First time I had used it and I will use it again!
  2. DDD = Love it: Used it for years and its really good.

Here are more details on each of these three approaches where I point out:

  1. What was good?
  2. What was bad?
  3. How did it fair under time pressure?

1. Modular Monolith

I was already aware of the modular monolith approach, but I hadn’t used this approach in application before. My experience was it worked really well: in fact, it was much better than I thought it would be. I would certainly use this approach again on any medium to large application.

1a. Modular Monolith – what was good?

The first good thing is how the modular monolith compared with the traditional “N-Layer” architecture (shortened to layered architecture). I have used the layered architecture approach many times, usually with four projects, but the modular monolith application has 22 projects, see figure to the right.

This means I know the code in each project is doing one job, and there are only links to other projects that are relevant to a project. That makes it easier to find and understand the code.

Also, I’m much more inclined to refactor the old code as I’m less likely to break something else. In contrast the layered architecture on its own has lots of different features in one project and I can’t be sure what its linked to, which makes me more declined refactor code as it might affect other code.

1b. Modular Monolith – what was bad?

Nothing was really wrong with the modular monolith but working out the project naming convention took some time, but it was super important (see the list of project in the figure above). By having the right naming convention, the name told me a) where in the layered architecture the project was in and b) the end of the name told me what it does. If you are going to try using a modular monolith approach, I recommend you think carefully about your naming convention.

I didn’t change name too often because of a development tool issue (Visual Studio) as you could rename the project, but the underlying folder name wasn’t changed, which makes the GitHub/folder display look wrong. The few I did change required me to rename the project, and then outside Visual Studio rename the folder and then hand-editing the solution file, which is horrible!

NOTE: I have since found a really nice tool that will do a project/folder rename and updates the solution/csproj files on applications that use Git for source control.

Also, I learnt to not end project name with the name of a class e.g., Book class, as that caused problems if you referred to the Book class in that project. That’s why you see projects ending with “s” e.g., “…Books” and “…Orders”.

1c. Modular Monolith – how did it fair under time pressure?

It certainly didn’t slow me down (other the time deliberating over the project naming convention!) and it might have made me a bit faster than the layered architecture because I knew where to go to find the code. If I had come back to work on the app after a few months, then it would be much quicker to find the code I am looking for and it will be easier to change without effecting other features.

I did find me breaking the rules at one point because I was racing to finish the book. The code ended up with 6 different ways to query the database for the book display, and there were a few common parts, like some of the constants used in the sort/filter display and one DTO. Instead of creating a BookApp.ServiceLayer.Common.Books project I just referred to the first project. That’s my bad, but it shows that while the modular monolith approach really helps separate the code, but it does rely on the developers following the rules.

NOTE: I had to go back to the Book App to add a feature to two of the book display query so I took the opportunity to create a project called BookApp.ServiceLayer.DisplayCommon.Books which holds all the common code. That has removed the linking between each query feature and made it clear what code is shared.

2. Domain-Driven Design

I have used DDD many years and I find it an excellent approach which is focuses on the business (domain) issues rather that the technical aspects of the application. Because I have used DDD so much it was second nature to me, but I try to define what works, didn’t work etc. in the Book App, and some feedback from working on client’s applications.

2a. Domain-Driven Design – what was good?

DDD is so massive I am going to only talk about one of the key aspects I use every day – that is the entity class. DDD says your entity classes should be in complete control over the data inside it, and its direct relationships. I therefore make all the business classes and the classes mapped to the database to have read-only properties and use constructors and methods to create or update the data inside.

Making the entity’s properties are read-only means your business logic/validation must be in the entity too. This means:

  • You know exactly where to look for the business logic/validation is – its in the entity with its data.
  • You change an entity by calling a appropriately named method in the entity, e.g. AddReview(…). That makes it crystal clear what you are doing and what parameters it needs.
  • The read-only aspect means you can ONLY update the data via the entity’s method. This is DRY, but more importantly its obvious where to find the code.
  • Your entity can never be in an invalid state because the constructors / methods will check the create/update and return an error if it would make the entity’s data invalid.

Overall, this means using a DDD entity class is so much clearer to create and update than changing a few properties in a normal class. I love the clarity that the names methods provide – its obvious what I am doing.

I already gave you an example of the Books and Orders bounded contexts. That worked really well and was easy to implement once a understood how to use EF Core to map a class to a table as if it was a SQL view.

2b. Domain-Driven Design – what was bad?

One of the downsides of DDD is you have to write more code. The figure below shows this by  comparing a normal (non-DDD) approach on the left against a DDD approach on the right in an ASP.NET Core/EF Core application.

Its not hard to see that you need to write more code, as the non-DDD version on the left is shorter. It might only be five or ten lines, but that mounts up pretty quickly when you have a real application with lots of entity classes and methods.

Having been using DDD for a long time I have built a library called EfCore.GenericServices, which reduces the amount of code you have to write. It does this replacing the repository and the method call with a service. You still have to write the method in the entity class, but that library reduces the rest of the code to a DTO / ViewModel and a service call. The core below how you would use the EfCore.GenericServices library in an ASP.NET Core action method.

[HttpPost]
[ValidateAntiForgeryToken]
public async Task<IActionResult> AddBookReview(AddReviewDto dto, 
    [FromServices] ICrudServicesAsync<BookDbContext> service)
{
    if (!ModelState.IsValid)
    {
        return View(dto);
    }
    await service.UpdateAndSaveAsync(dto);

    if (service.IsValid)
        //Success path
        return View("BookUpdated", service.Message);

    //Error path
    service.CopyErrorsToModelState(ModelState, dto);
    return View(dto);
}

Over the years the EfCore.GenericServices library has saved me a LOT of development time. It can’t do everything but its great at handling all the simple Create, Update and Deleting (known as CUD) leaving me to work on more complex parts of the application.

2c. Domain-Driven Design – how did it fair under time pressure?

Yes, DDD does take a bit more time but its (almost) impossible to bypass the way DDD works because of the design. In the Book App it worked really well, but I did use an extra approach known as domain events (see this article about this approach) which made some of the business logic easier to implement.

I didn’t break any DDD rules in the Book App, but two projects I worked on both clients found calling DDD methods for every update didn’t work for their front-end. For instance, one client wanted to use JSON Patch to speed up the front-end (angular) development.

To handle this I came up with the hybrid DDD style where non-business properties are read-write, but data with business logic/validation has to go through methods calls. The hybrid DDD style is a bit of a compromise over DDD, but certainly speeded up the development for my clients.  In retrospect both projects worked well, but I do worry that the hybrid DDD does allow a developer in a hurry to make a property read-write when they shouldn’t. If every entity class is locked down, then no one can break the rules.

Conclusion

I always like to analyse any applications I work on after the project is finished. That’s because when I am still working on a project there is often pressure (often from myself) to “get it done”. By analysing a project after its finished or a key milestone is met, I can look back more dispassionately and see if there is anything to learn from the project.

My main take-away from building the Book App is that the Modular Monolith approach was very good, and I would use it again. The modular monolith approach provides small, focused projects. That means I know all the code in the project is doing one job, and there are only links to other projects that are relevant to a project. That makes it easier to understand, and I’m much more inclined to refactor the old code as I’m less likely to break something else.

I would say that the hardest part of using the Modular Monolith approach was working out the naming convention of the projects, which only goes to prove the quote “There are only two hard things in Computer Science: cache invalidation and naming things” is still true😊.

DDD is an old friend to me and while it needed a bit more lines of code written the result is a rock-solid design where every entity makes sure its data in a valid state. Quite a lot of the validation and simple business logic can live inside the entity class, but business logic that cuts across multiple entities or bounded context can be a challenge. I have an approach to handle that, but I also used a new feature I learnt from a client’s project about a year ago and that helped too.

I hope this article helps you consider these two architectural approaches, or if you are using these approaches, they might spark some ideas. I’m sure many people have their own ideas or experiences so please do leave comments as I’m sure there is more to say on this subject.

Happy coding.

How to update a database’s schema without using EF Core’s migrate feature

Last Updated: January 27, 2021 | Created: January 26, 2021

This article is aimed at developers that want to use EF Core to access the database but want complete control over their database schema. I decided to write this article after seeing the EF Core Community standup covering the EF Core 6.0 Survey Results. In that video there was a page looking at the ways people deploy changes to production (link to video at that point), and quite a few respondents said they use SQL scripts to update their production database.

It’s not clear if people create those SQL scripts themselves or use EF Core’s Migrate SQL scripts, but Marcus Christensen commented during the Community standup (link to video at that point) said “A lot of projects that I have worked on during the years, has held the db as the one truth for everything, so the switch to code first is not that easy to sell.

To put that in context he was saying that some developers want to retain control of the database’s schema and have EF Core match the given database. EF Core can definitely do that, but in practice it gets a bit more complex (I know because I have done this on real-world applications).

TL;DR – summary

  • If you have a database that isn’t going to change much, then EF Core’s reverse engineering tool can create classes to map to the database and the correct DbContext and configurations.
  • If you are change the database’s schema as the project progresses, then the reverse engineering on its own isn’t such a good idea. I cover three approaches to cover this:
    • Option 0: Have a look at the EF Core 5’s improved migration feature to check it work for you – I will save you time if it can work for your project.
    • Option 1: Use Visual Studio’s extension called EF Core Power Tools. This is reverse engineering on steroids and is designed for repeated database’s schema changes.
    • Option 2: Use the EfCore.SchemaCompare library. This lets you to write EF Core code and update database schema manually and tells you where they differ.  

Setting the scene – what are the issues of updating your database schema yourself?

If you have a database that isn’t changing, then EF Core’s reverse engineering tool as a great fit. This reads your SQL database and creates the classes to map to the database (I call these classes entity classes) and a class you use to access the database, with EF Core configurations/attributes to define things in the EF Core to match your database.

That’s fine for a fixed database as you can take the code the reverse engineering tool output and edit it to work the way you want it to. You can (carefully) alter the code that the reverse engineering tool produces to get the best out of EF Core’s features, like Value Converters, Owned type, Query Filters and so on.

The problems come if you are enhancing the database as the project progresses, because the EF Core’s reverse engineering works, but some things aren’t so good:

  1. The reverse engineering tool has no way to detect useful EF Core features, like Owned Types, Value Converters, Query Filters, Table-per-Hierarchy, Table-per-Type, table splitting, and concurrency tokens, which means you need to edit the entity classes and the EF Core configurations.
  2. You can’t edit the entity classes or the DbContext class because you will be replacing them the next time you change your database. One way around this is to add to the entity classes with another class of the same name – that works because the entity classes are marked as partial.
  3. The entity classes have all the possible navigational relationships added, which can be confusing if some navigational relationships would typically not be added because of certain business rules. Also, you can’t change the entity classes to follow a Domain-Driven Design approach.
  4. A minor point, you need to type in the reverse engineering command, which can be long, every time. I only mention because Option 1 will solve that for you.

So, if you want to have complete control over your database you have a few options, one of which I created (see Option 2). I start with a non-obvious approach considering the title of this article – that is using EF Core to create a migration and tweaking it. I think its worth a quick look at this to make sure you’re not taking on more work than need to – simply skip Option 0 if you are sure you don’t want to use EF Core migrations.

Option 0: Use EF Core’s Migrate feature

I have just finished updating my book Entity Framework Core in Action to EF Core 5 and I completely rewrote many of the chapters from scratch because there was so much change in EF Core (or my understanding of EF Core) – one of those complete rewrites was the chapter on handling database migrations.

I have say I wasn’t a fan of EF Core migration feature, but after writing the migration chapter I’m coming around to using the migration feature. Partly it was because I more experience on real-world EF Core applications, but also some of the new features like the MigrationBuilder.Sql() gives me more control of what the migration does.

The EF Core team want you to at least review, and possibly alter, a migration. Their approach is that the migration is rather like a scaffolded Razor Page (ASP.NET Core example), where it’s a good start, but you might want to alter it. There is a great video on EF Core 5 updated migrations and there was a discussion about this (link to video at the start of that discussion).

NOTE: If you decide to use the migration feature and manually alter the migration you might find Option 2 useful to double check your migration changes still matches EF Core’s Model of the database.

So, you might like to have a look at EF Core migration feature to see if it might work for you. You don’t have to change much in the way you apply the SQL migration scripts, as EF Core team recommends having the migration be apply by scripts anyway.

Option 1: Use Visual Studio’s extension called EF Core Power Tools

In my opinion, if you want to reverse engineer multiple times, then you should use Erik Ejlskov Jensen (known as @ErikEJ on GitHub and Twitter) Visual Studio’s extension called EF Core Power Tools. This allows you to run EF Core’s reverse engineering service via a friendly user interface, but that’s just the start. It provides a ton of options, some not even EF Core’s reverse engineering like reverse engineering SQL stored procs. All your options are stored, which makes subsequent reverse engineering just select and click. The extension also has lots other features creating a diagram of your database based on what your entity classes and EF Core configuration.  

I’m not going to detail all the features in the EF Core Power Tools extension because Erik has done that already. Here are two videos as a starting point, plus a link to the EF Core Power Tools documentation.

So, if you are happy with the general type of output that reverse engineering produces, then the EF Core Power Tools extension is a very helpful tool with extra features over the EF Core reverse engineering tool. EF Core Power Tools also specifically designed for continuous changes to the database, and Erik used it that way in the company he was working for.

NOTE: I talked to Erik and he said they use a SQL Server database project (.sqlproj) to keep the SQL Server schema under source control, and the resulting SQL Server .dacpac files to update the database and EF Core Power Tools to update the code. See this article for how Erik does this.

OPTION 2: Use the EfCore.SchemaCompare library

The problem with any reverse engineering approach is that you aren’t fully in control of the entity classes and the EF Core features. Just as developers want complete control over the database, I also want complete control of my entity classes and what EF Core features a can use. As Jeremy Likness said on the EF Core 6.0 survey video when database-first etc were being talked about  “I want to model the domain properly and model the database property and then use (EF Core) fluent API to map to two together in the right way” (link to video at that point).

I feel the same, and I built a feature I refer to as EfSchemaCompare – the latest version of this (I have version going back to EF6!) is in the repo https://github.com/JonPSmith/EfCore.SchemaCompare. This library compares EF Core’s view of the database based on the entity classes and the EF Core configuration against any relational database that EF Core supports. That’s because, like EF Core Power Tools, I use EF Core’s reverse engineering service to read the database, so no extra coding for me to do.

This library allows me (and you) to create my own SQL scripts to update the database while using any EF Core feature I need in my code. I can then run the EfSchemaCompare tool and it tells me if my EF Core code matches the database. If they don’t it gives me detailed errors so that I can fix either the database or the EF Core code. Here is a simplified diagram on how EfSchemaCompare works.

The plus side of this is I can write my entity classes any way I like, and normally I use a DDD pattern for my entity classes. I can also use many of the great EF Core features like Owned Types, Value Converters, Query Filters, Table-per-Hierarchy, table splitting, and concurrency tokens in the way I want to. Also, I control the database schema – in the past I have created SQL scripts and applied them to the database using DbUp.

The downside is I have to do more work. The reverse engineering tool or the EF Core migrate feature could do part of the work, but I have decided I want complete control over the entity classes and the EF Core features I use. As I said before, I think the migration feature (and documentation) in EF Core 5 is really good now, but for complex applications, say working with a database that has non-EF Core applications accessing it, then the EfSchemaCompare tool is my go-to solution.

The README file in the EfCore.SchemaCompare repo contains all the documentation on what the library checks and how to call it. I typically create a unit test to check a database – there are lots of options to allow you to provide the connection string of the database you want to test against your entity classes and EF Core configuration provided by your application’s DbContext class.

NOTE: The EfCore.SchemaCompare library only works with EF Core 5. There is a version in the EfCore.TestSuport library version 3.2.0 that works with EF Core 2.1 and EF Core 3.? and the documentation for that can be found in the Old-Docs page. This older version has more limitations than the latest EfCore.SchemaCompare version.

Conclusion

So, if you, like Marcus Christensen, consider “the database as the one truth for everything”, then I have described two (maybe three) options you could use. Taking charge of your database schema/update is a good idea, but it does mean you have to do more work.

Using EF Core’s migration tool, with the ability to alter the migration is the simplest, but some people don’t like that. The reverser engineering/EF Core Power Tools is the next easiest, as it will write the EF Core code for you. But if you want to really tap into EF Core’s features and/or DDD, then these approaches don’t cut it. That’s why I created the many versions of the EfSchemaCompare library.

I have used the EfSchemaCompare library on real-world applications, and I have also worked on client projects that used EF Core’s migration feature. The migration feature is much simpler but sometimes it’s too easy, which means you don’t think enough about what the best schema for your database would be. But that’s not the problem of the migration feature, its our desire/need to quickly move on, because you can change any migration EF Core produces if you want to.

I hope this article was useful to you on your usage of EF Core. Let me know your thoughts on this in the comments.

Happy coding.

Using ValueTask to create methods that can work as sync or async

Last Updated: January 25, 2021 | Created: January 23, 2021

In this article I delve into C#’s ValueTask struct, which provides a subset of the Task class features, and use it’s features to solve a problem of building libraries that need both sync and async version of the library’s methods. Along the way I learnt something about ValueTask and how it works with sync code.

NOTE: Thanks to Stephen Toub, who works on the Microsoft NET platform and wrote the article “Understanding the Whys, Whats, and Whens of ValueTask”, for confirming this is a valid approach and is used inside Microsoft’s code. His feedback, plus amoerie’s comment, helped me to improve the code to return the correct stack trace.

TL;DR – summary

  • Many of my libraries provide a sync and async version of each method. This can cause me to have to duplicate code, one for the sync call and one for the async call, with just a few different calls, e.g. SaveChanges and SaveChangesAsync
  • This article tells you how the ValueTask (and ValueTask <TResult>) works when it returns without running an async method, and what its properties mean. I also have some unit tests to check this.
  • Using this information, I found a way to use C#’s ValueTask to build a single method work as sync or async method, which is selected by a book a parameter. This removes a lot of duplicate code.
  • I have built some extension methods that will check that the returned ValueTask a) didn’t use an async call, and b) if an exception was thrown in the method (which won’t bubble up) it then throws it so that it does bubble up.

Setting the scene – why I needed methods to work sync or async

I have built quite a few libraries, NuGet says I have 15 packages, and most are designed to work with EF Core (a few are for EF6.x). Five of these have both sync and async versions of the methods to allow the developer to use it whatever way they want to. This means I have to build some methods twice: one for sync and one for async, and of course that leads to duplication of code.

Normally I can minimise the duplication by building internal methods that return IQueryable<T>, but when I developed the EfCore.GenericEventRunner library I wasn’t querying the database but running sync or async code provided by the developer. The internal methods normally have lots of code with one or two methods that could be sync or async, e.g. SaveChanges and SaveChangesAsync.

Ideally, I wanted internal methods that I could call that sync or async, where a parameter told it whether to call sync or async, e.g.

  • SYNC:   var result = MyMethod(useAsync: false)
  • ASYNC: var result = await MyMethod(useAsync: true)

I found the amazingly good article by Stephen Toub called “Understanding the Whys, Whats, and Whens of ValueTask” which explained about ValueTask <TResult> and synchronous completion, and this got me thinking – can I use ValueTask to make a method that could work sync and async? And I could! Read on to see how I did this.

What happens when a ValueTask has synchronous completion?

The ValueTask (and ValueTask <TResult>) code is complex and linked to the Task class, and the documentation is rather short on explaining what an “failed operation”. But from lots of unit tests and inspecting the internal data I worked out what happens with a sync return.

The ValueTask (and ValueTask <TResult>) have four bool properties. They are:

  • IsCompleted: This is true if the ValueTask is completed. So, if I captured the ValueTask, and this was true, then it had finished with means I don’t have to await it.
  • IsCompletedSuccessfully: This is true if no error happened. In a sync return it means no exception has been thrown.
  • IsFaulted: This is true if there was an error, and for a sync return that means an exception.
  • IsCancelled: This is true the CancellationToken cancelled the async method. This is not used in a sync return.

From this information I decided I could check that a method had synchronously if the IsCompleted property is true.

The next problem was what to do when a method using ValueTask throws an exception. The exception isn’t bubbled up but is held inside the ValueTask so I needed to extract that exception to throw it. I bit more unit testing and inspecting the ValueTask internals showed me how to extract the exception and throw it. Information provided by Stephen Toub showed a better way to throw the exception with the correct stacktrace.

NOTE: You can see the unit tests I did to detect what ValueTask and ValueTask <TResult> here.

So I could my var valueTask =MyMethod(useAsync: false) method and inspect the valueTask returned to check it didn’t call any async methods inside it, and calls GetResult, which will throw an exception if there is one. The code below does this for a ValueTask (this ValueTaskSyncCheckers class also contains a similar method for ValueTask<TResult>).

This code comes from from Microsoft code which this approach is used (look at this code and search for “useAsync: false”). Stephen Toub told me The valueTask.GetAwaiter().GetResult(); is the best way to end an ValueTask, even for the version that doesn’t return a result.. That’s because:

  • If there was an exception, then that call will throw the exception inside the method with the correct stacktrace.
  • Stephen Toub said that it should call GetResult even in the version with no result as if your method is used in a pooled resource, that call it typically used to tell the pooled resource is no longer used.

The listing below shows the two versions of the CheckSyncValueTaskWorked methods – the first is for ValueTask and the second for ValueTask<TResult>.

public static void CheckSyncValueTaskWorked(
    this ValueTask valueTask)
{
    if (!valueTask.IsCompleted)
        throw new InvalidOperationException(
            "Expected a sync task, but got an async task");
    valueTask.GetAwaiter().GetResult();
}

public static TResult CheckSyncValueTaskWorkedAndReturnResult
    <TResult>(this ValueTask<TResult> valueTask)
{
    if (!valueTask.IsCompleted)
        throw new InvalidOperationException(
             "Expected a sync task, but got an async task");
    return valueTask.GetAwaiter().GetResult();
}

NOTE: You can access these extension methods via this link.

How I used this feature in my libraries

I first used this in my EfCore.GenericEventRunner library, but those examples are complex, so I show a simple example in my EfCore.SoftDeleteServices, which has a very simple example. Here is a method that uses the useAsync property – see the highlighted lines at the end of the code.

public static async ValueTask<TEntity> LoadEntityViaPrimaryKeys<TEntity>(this DbContext conte
    Dictionary<Type, Expression<Func<object, bool>>> otherFilters, 
    bool useAsync,
    params object[] keyValues)
    where TEntity : class
{
    // Lots of checks/exceptions left out 

    var entityType = context.Model.FindEntityType(typeof(TEntity));
    var keyProps = context.Model.FindEntityType(typeof(TEntity))
        .FindPrimaryKey().Properties
        .Select(x => x.PropertyInfo).ToList();

    var filterOutInvalidEntities = otherFilters
          .FormOtherFiltersOnly<TEntity>();
    var query = filterOutInvalidEntities == null
        ? context.Set<TEntity>().IgnoreQueryFilters()
        : context.Set<TEntity>().IgnoreQueryFilters()
            .Where(filterOutInvalidEntities);

    return useAsync
        ? await query.SingleOrDefaultAsync(
              CreateFilter<TEntity>(keyProps, keyValues))
        : query.SingleOrDefault(
              CreateFilter<TEntity>(keyProps, keyValues));
}

The following two versions – notice the sync takes the ValueTask and then calls the CheckSyncValueTaskWorked  method, while the async uses the normal async/await approach.

SYNC VERSION

var entity= _context.LoadEntityViaPrimaryKeys<TEntity>(
    _config.OtherFilters, false, keyValues)
    .CheckSyncValueTaskWorkedAndReturnResult();
if (entity == null)
{
    //… rest of code left out

ASYNC VERSION

var entity = await _context.LoadEntityViaPrimaryKeys<TEntity>(
     _config.OtherFilters, true, keyValues);
if (entity == null) 
{
    //… rest of code left out

NOTE: I generally create the sync version of a library first, as its much easier to debug because async exception stacktraces are hard to read and the debug data can be harder to read. Once I have the sync version working, with its unit tests, then I build the async side of the library.

Conclusion

So, I used this sync/async approach in my EfCore.GenericEventRunner library, where the code is very complex, and it really made the job much easier. I then used the same approach in EfCore.SoftDeleteServices library – again there was a complex class called CascadeWalker, that “walks” the dependant navigational properties. Both of this approach stopped a significant duplication of code.

You might not be building a library, but you have learnt how the ValueTask does when it returns a sync result to an async call. The ValueType is there to make the sync return faster, and especially memory usage. Also, you now have another approach if you have a similar sync/async need.

NOTE:  ValueTask has a number of limitations so I only use ValueType in my internal parts of my libraries and provide a Task version to the user of my libraries.  

In case you missed it do read the excellent article “Understanding the Whys, Whats, and Whens of ValueTask” which explained ValueTask. And thanks again to Stephen Toub and amoerie’s comment for improving the solution.

Happy coding.

New features for unit testing your Entity Framework Core 5 code

Last Updated: January 21, 2021 | Created: January 21, 2021

This article is about unit testing applications that use Entity Framework Core (EF Core), with the focus on the new approaches you can use in EF Core 5. I start out with an overview of the different ways of unit testing in EF Core and then highlight some improvements brought in EF Core 5, plus some improvements I added to the EfCore.TestSupport version 5 library that is designed to help unit testing of EF Core 5.

NOTE: I am using xUnit in my unit tests. xUnit is well supported in NET and widely used (EF Core uses xUnit for its testing).

TL;DR – summary

NOTE: In this article I use the term entity class (or entity instance) to refer to a class that is mapped to the database by EF Core.

Setting the scene – why, and how, I unit test my EF Core code

I have used unit testing since I came back to being a developer (I had a long stint as a tech manager) and I consider unit testing one of the most useful tools to produce correct code. And when it comes to database code, which I write a lot of, I started with a repository pattern which is easy to unit test, but I soon moved away from the repository patten to using Query Objects. At that point had to work out how to unit test my code that uses the database. This was critical to me as I wanted my tests to cover as much of my code as possible.

In EF6.x I used a library called Effort which mocks the database, but when EF Core came out, I had to find another way. I tried EF Core’s In-Memory database provider, but the EF Core’s SQLite database provider using an in-memory database was MUCH better. The SQLite in-memory database (I cover this later) very easy to use for unit tests, but has some limitations so sometimes I have to use a real database.

The reason why I unit test my EF Core code is to make sure they work as I expected. Typical things that I am trying to catch.

  • Bad LINQ code: EF Core throws an exception if can’t translate my LINQ query into database code. That will make the unit test fail.
  • Database write didn’t work. Sometimes my EF Core create, update or delete didn’t work the way I expected. Maybe I left out an Include or forgot to call SaveChanges. By testing the database

While some people don’t like using unit tests on a database my approach has caught many, many errors which would be been hard to catch in the application. Also, a unit test gives me immediate feedback when I’m writing code and continues to check that code as I extend and refactor that code.

The three ways to unit test your EF Core code

Before I start on the new features here is a quick look at the three ways you can unit test your EF Core code. The figure (which comes from chapter 17 on my book “Entity Framework Core in Action, 2nd edition”) compares three ways to unit test your EF Core code, with the pros and cons of each.

As the figure says, using the same type of database in your unit test as what your application uses is the safest approach – the unit test database accesses will respond just the same as your production system. In this case, why would I also list using an SQLite in-memory database in unit tests? While there are some limitations/differences from other databases, it does have many positives:

  1. The database schema is always up to date.
  2. The database is empty, which is a good starting point for a unit test.
  3. Running your unit tests in parallel works because each database is held locally in each test.
  4. Your unit tests will run successfully in the Test part of a DevOps pipeline without any other settings.
  5. Your unit tests are faster.

NOTE: Item 3, “running your unit tests in parallel”, is an xUnit feature which makes running unit tests much quicker. It does mean you need separate databases for each unit test class if you are using a non in-memory database. The library EfCore.TestSupport has features to obtain unique database names for SQL Server databases.

The last option, mocking the database, really relies on you using some form pf repository pattern. I use this for really complex business logic where I build a specific repository pattern for the business logic, which allows me to intercept and mock the database access – see this article for this approach.

The new features in EF Core 5 that help with unit testing

I am going to cover four features that have changes in the EfCore.TestSupport library, either because of new features in EF Core 5, or improvements that have been added to the library.

  • Creating a unique, empty test database with the correct schema
  • How to make sure your EF Core accesses match the real-world usage
  • Improved SQLite in-memory options to dispose the database
  • How to check the SQL created by your EF Core code

Creating a unique, empty test database with the correct schema

To use a database in a xUnit unit test it needs to be:

  • Unique to the test class: that’s needed to allow for parallel running of your unit tests
  • Its schema must match the current Model of your application’s DbContext: if the database schema is different to what EF Core things it is, then your unit tests aren’t working on the correct database.
  • The database’s data should be a known state, otherwise your unit tests won’t know what to expect when reading the database. An empty database is the best choice.

Here are three ways to make sure database fulfils these three requirements.

  1. Use an SQLite in-memory which is created every time.
  2. Use a unique database name, plus calling EnsureDeleted, and then EnsureCreated.
  3. Use unique database name, plus call the EnsureClean method (only works on SQL Server).

1. Use an SQLite in-memory which is created every time.

The SQLite database has a in-memory mode, which is applied by setting the connection string to “Filename=:memory:”. The database is then hidden in the connection string, which makes it unique to its unit test and its database hasn’t been created yet. This is quick and easy, but you production database uses another type of database then it might not work for you.

The EF Core documentation on unit testing shows one way to set up a in-memory SQLite database, but I use the EfCore.TestSupport library’s static method called SqliteInMemory.CreateOptions<TContext> that will setup the options for creating an SQLite in-memory database, as shown in the code below.

[Fact]
public void TestSqliteInMemoryOk()
{
    //SETUP
    var options = SqliteInMemory.CreateOptions<BookContext>();
    using var context = new BookContext(options);

    context.Database.EnsureCreated();

    //Rest of unit test is left out
}

The database isn’t created at the start, so you need to call EF Core’s EnsureCreated method at the start. This means you get a database that matches the current Model of your application’s DbContext.

2. Unique database name, plus Calling EnsureDeleted, then EnsureCreated

If you are use a normal (non in-memory) database, then you need to make sure the database has a unique name for each test class (xUnit runs test classes in parallel, but methods in a test class are run serially). To get a unique database name the EfCore.TestSupport has methods that take a base SQL Server connection string from an appsetting.json file and adds the class name to the end of the current name (see code after next paragraph, and this docs).

That solves the unique database name, and we solve the “matching schema” and “empty database” by calling the EnsureDeleted method, then the EnsureCreated method. These two methods will delete the existing database and create a new database whose schema will match the EF Core’s current Model of the database. The EnsureDeleted / EnsureCreated approach works for all databases but is shown with SQL Server here.

[Fact]
public void TestEnsureDeletedEnsureCreatedOk()
{
    //SETUP
    var options = this.CreateUniqueClassOptions<BookContext>();
    using var context = new BookContext(options);
    
    context.Database.EnsureDeleted();
    context.Database.EnsureCreated();

    //Rest of unit test is left out
}

The EnsureDeleted / EnsureCreated approach used to be very slow (~10 seconds) for a SQL Server database, but since the new SqlClient came out in NET 5 this is much quicker (~ 1.5 seconds), which makes a big difference to how long a unit test would take to run when using this EnsureDeleted + EnsureCreated version.

3. Unique database name, plus call the EnsureClean method (only works on SQL Server).

While asking some questions on the EFCore GitHub Arthur Vickers described a method that could wipe the schema of an SQL Server database.  This clever method removed the current schema of the database by deleting all the SQL indexes, constraints, tables, sequences, UDFs and so on in the database. It then, by default, calls EnsuredCreated method to return a database with the correct schema and empty of data.

The EnsureClean method is deep inside EF Core’s unit tests, but I extracted that code and build the other parts needed to make it useful and it is available in version 5 of the EfCore.TestSupport library. The following listing shows how you use this method in your unit test.

[Fact]
public void TestSqlDatabaseEnsureCleanOk()
{
    //SETUP
    var options = this.CreateUniqueClassOptions<BookContext>();
    using var context = new BookContext(options);
    
    context.Database.EnsureClean();

    //Rest of unit test is left out
}

EnsureClean approach is a faster, maybe twice as fast as the EnsureDeleted + EnsureCreated version, which could make a big difference to how long your unit tests take to run. It also better in situations where your database server won’t allow you to delete or create new databases but does allow you to read/write a database, for instance if your test databases were on SQL Server where you don’t have admin privileges.

How to make sure your EF Core accesses match the real-world usage

Each unit test is a single method that has to a) setup the database ready for testing, b) runs the code you are testing, and the final part, c) checks that the results of the code you are testing are correct. And the middle part, run the code, must reproduce the situation in which the code you are testing is normally used. But because all three parts are all in one method it can be difficult to create the same state that the test code is normally used in.

The issue of “reproducing the same state the test code is normally used in” is common to unit testing, but when testing EF Core code this is made more complicated by the EF Core feature called Identity Resolution. Identity Resolution is critical in your normal code as it makes sure you only have one entity instance of a class type that has a specific primary key (see this example). The problem is that Identity Resolution can make your unit test pass even when there is a bug in your code.

Here is a unit test that passes because of Identity Resolution. The test of the Price at the end of the unit test should fail, because SaveChanges wasn’t called (see line 15). The reason it passed is because the entity instance in the variable called verifyBook was read from the database, but because the tracked entity instances inside the DbContext was found, and that was returned instead of reading from the database.

[Fact]
public void ExampleIdentityResolutionBad()
{
    //SETUP
    var options = SqliteInMemory
        .CreateOptions<EfCoreContext>();
    using var context = new EfCoreContext(options);

    context.Database.EnsureCreated();
    context.SeedDatabaseFourBooks();

    //ATTEMPT
    var book = context.Books.First();
    book.Price = 123;
    // Should call context.SaveChanges()

    //VERIFY
    var verifyBook = context.Books.First();
    //!!! THIS IS WRONG !!! THIS IS WRONG
    verifyBook.Price.ShouldEqual(123);
}

In the past we fixed this with multiple instances of the DbContext, as shown in the following code

public void UsingThreeInstancesOfTheDbcontext()
{
    //SETUP
    var options = SqliteInMemory         
        .CreateOptions<EfCoreContext>(); 
    options.StopNextDispose();
    using (var context = new EfCoreContext(options)) 
    {
        //SETUP instance
    }
    options.StopNextDispose();   
    using (var context = new EfCoreContext(options)) 
    {
        //ATTEMPT instance
    }
    using (var context = new EfCoreContext(options)) 
    {
        //VERIFY instance
    }
}

But there is a better way to do this with EF Core 5’s ChangeTracker.Clear method. This method quickly removes all entity instances the DbContext is currently tracking. This means you can use one instance of the DbContext, but each stage, SETUP, ATTEMPT and VERIFY, are all isolated which stops Identity Resolution from giving you data from another satge. In the code below there are two potential errors that would slip through if you didn’t add calls to the ChangeTracker.Clear method (or used multiple DbContexts).

  • Line 15: If the Include was left out the unit test would still pass (because the Reviews collection was set up in the SETUP stage).
  • Line 19: If the SaveChanges was left out the unit test would still pass (because the VERIFY read of the database would have bee given the book entity from the ATTEMPT stage)
public void UsingChangeTrackerClear()
{
    //SETUP
    var options = SqliteInMemory
        .CreateOptions<EfCoreContext>();
    using var context = new EfCoreContext(options);

    context.Database.EnsureCreated();             
    var setupBooks = context.SeedDatabaseFourBooks();              

    context.ChangeTracker.Clear();                

    //ATTEMPT
    var book = context.Books                      
        .Include(b => b.Reviews)
        .Single(b => b.BookId = setupBooks.Last().BookId);           
    book.Reviews.Add(new Review { NumStars = 5 });

    context.SaveChanges();                        

    //VERIFY
    context.ChangeTracker.Clear();                

    context.Books.Include(b => b.Reviews)         
        .Single(b => b.BookId = setupBooks.Last().BookId)            
        .Reviews.Count.ShouldEqual(3);            
}

This is much better than the three separate DbContext instances because

  1. You don’t have to create the three DbContext’s scopes (saves typing. Shorter unit test)
  2. You can use using var context = …, so no indents (nicer to write. Easier to read)
  3. You can still refer to previous parts, say to get its primary key (see use of setupBooks on line 16 and 25)
  4. It works better with the improved SQLite in-memory disposable options (see next section)

Improved SQLite in-memory options to dispose the database

You have already seen the SqliteInMemory.CreateOptions<TContext> method earlier but in version 5 of the EfCore.TestSupport library I have updated it to dispose the SQLite connection when the DbContext is disposed. The SQLite connection holds the in-memory database so disposing it makes sure that the memory used to hold the database is released.

NOTE: In previous versions of the EfCore.TestSupport library I didn’t do that, and I haven’t had any memory problems. But the EF Core docs say you should dispose the connection, so I updated the SqliteInMemory options methods to implement the IDisposable interface.

It turns out the disposing of the DbContext instance will dispose the options instance, which in turn disposes the SQLite connection. See the comments at the end of the code.

[Fact]
public void TestSqliteInMemoryOk()
{
    //SETUP
    var options = SqliteInMemory.CreateOptions<BookContext>();
    using var context = new BookContext(options);

    //Rest of unit test is left out
} // context is disposed at end of the using var scope,
  // which disposes the options that was used to create it, 
  // which in turn disposes the SQLite connection

NOTE: If you use multiple instances of the DbContext based on the same options instance, then you need to use one of these approaches to delay the dispose of the options until the end of the unit test.

How to check the SQL created by your EF Core code

If I’m interested in the performance of some part of the code I am working on, it often easier to look at the SQL commands in a unit test than in the actual application. EF Core 5 provides two ways to do this:

  1. The AsQueryString method to use on database queries
  2. Capturing EF Core’s logging output using the LogTo method

1. The AsQueryString method to use on database queries

The AsQueryString method will turn an IQueryable variable built using EF Core’s DbContext into a string containing the database commands for that query. Typically, I output this string the xUnit’s window so I can check it (see line 24). In the code below also contains a test so you can see the created SQL code – I don’t normally do that, but I put it in so you could see the type of output you get.

public class TestToQueryString
{
    private readonly ITestOutputHelper _output;

    public TestToQueryString(ITestOutputHelper output)
    {
        _output = output;
    }

    [Fact]
    public void TestToQueryStringOnLinqQuery()
    {
        //SETUP
        var options = SqliteInMemory.CreateOptions<BookDb
        using var context = new BookDbContext(options);
        context.Database.EnsureCreated();
        context.SeedDatabaseFourBooks();

        //ATTEMPT 
        var query = context.Books.Select(x => x.BookId); 
        var bookIds = query.ToArray();                   

        //VERIFY
        _output.WriteLine(query.ToQueryString());        
        query.ToQueryString().ShouldEqual(               
            "SELECT \"b\".\"BookId\"\r\n" +              
            "FROM \"Books\" AS \"b\"\r\n" +              
            "WHERE NOT (\"b\".\"SoftDeleted\")");        
        bookIds.ShouldEqual(new []{1,2,3,4});            
    }
}

2. Capturing EF Core’s logging output using the LogTo method

EF Core 5 makes it much easier to capture the logging that EF Core outputs (before you needed to create a ILoggerProvider class and register that with EF Core). But now you can add the LogTo method to your options and it will return a string output for every log. The code below shows how to do this, with the logs output to the xUnit’s window.

[Fact]
public void TestLogToDemoToConsole()
{
    //SETUP
    var connectionString = 
        this.GetUniqueDatabaseConnectionString();
    var builder =                                   
        new DbContextOptionsBuilder<BookDbContext>()
        .UseSqlServer(connectionString)             
        .EnableSensitiveDataLogging()
        .LogTo(_output.WriteLine);


    // Rest of unit test is left out

The LogTo has lots of different ways to filter and format the output (see the EF Core docs here). I created versions of the EfCore.TestSupport’s SQLite in-memory and SQL Server option builders to use LogTo, and I decided to use a class, called LogToOptions, to manage all the filters/formats for LogTo (the LogTo requires calls to different methods). This allowed me to define better defaults (defaults to: a) Information, not Debug, log level, b) does not include datetime in output) for logging output and make it easier for the filters/formats to be changed.

I also added a feature that I use a lot, that is the ability to turn on or off the log output. The code below shows the SQLite in-memory option builder with the LogTo output, plus using the ShowLog feature. This only starts output logging once the database has been created and seeded – see highlighted line 17 .

[Fact]
public void TestEfCoreLoggingCheckSqlOutputShowLog()
{
    //SETUP
    var logToOptions = new LogToOptions
    {
        ShowLog = false
    };
    var options = SqliteInMemory
         .CreateOptionsWithLogTo<BookContext>(
             _output.WriteLine);
    using var context = new BookContext(options);
    context.Database.EnsureCreated();
    context.SeedDatabaseFourBooks();

    //ATTEMPT 
    logToOptions.ShowLog = true;
    var book = context.Books.Single(x => x.Reviews.Count() > 1);

    //Rest of unit test left out
}

Information to existing user of the EfCore.TestSupport library

It’s no longer possible to detect the EF Core version via the netstandard so now it is done via the first number in the library’s version. For instance EfCore.TestSupport, version 5.?.? works with EF Core 5.?.?.* At the same time the library was getting hard to keep up to date, especially with EfSchemaCompare in it, so I took the opportunity to clean up the library.

BUT that clean up includes BREAKING CHANGES, mainly around SQLite in-memory option builder. If you use SqliteInMemory.CreateOptions you MUST read this document to decided whether you want to upgrade or not.

NOTE: You may not be aware, but your NuGet packages in your test projects override the same NuGet packages installed EfCore.TestSupport library. So, as long as you add the newest versions of the EF Core NuGet libraries, then the EfCore.TestSupport library will use those. The only part that won’t run is the EfSchemaCompare, but has now got its own library so you can use that directly.

Conclusion

I just read a great article on the stackoverfow blog which said

“Which of these bugs would be easier for you to find, understand, repro, and fix: a bug in the code you know you wrote earlier today or a bug in the code someone on your team probably wrote last year? It’s not even close! You will never, ever again debug or understand this code as well as you do right now, with your original intent fresh in your brain, your understanding of the problem and its solution space rich and fresh.

I totally agree and that’s why I love unit tests – I get feedback as I am developing (I also like that unit tests will tell me if I broke some old code too). So, I couldn’t work without unit tests, but at the same time I know that unit tests can take a lot of development time. How do I solve that dilemma?

My answer to the extra development time isn’t to write less unit tests, but to build a library and develop patterns that makes me really quick at unit testing. I also try to make my unit tests fast to run – that’s why I worry about how long it takes to set up the database, and why I like xUnit’s parallel running of unit tests.

The changes to the EfCore.TestSupport library and my unit test patterns due to EF Core 5 are fairly small, but each one reduces the number of lines I have to write for each unit test or makes the unit test easier to read. I think the ChangeTracker.Clear method is the best improvement because it does both – it’s easier to write and easier to read my unit tests.

Happy coding.

Updating many-to-many relationships in EF Core 5 and above

Last Updated: March 23, 2021 | Created: January 14, 2021

EF Core 5 added a direct many-to-many relationship with zero configuration (hurrah!). This article describes how to set this direct many-to-many relationship and why (and how) you might want to configure this direct many-to-many. I also include the original many-to-many relationship (referred to as indirect many-to-many from now on) where you need to create the class that acts as the linking table between the two classes.

You might be mildly interested that this is the third iteration of this article.  I wrote the first article on many-to-many on EF6.x in 2014, and another many-to-many article for EF Core in 2017. All of these got a lot of views, so I had to write a new article once EF Core 5 came out. I hope you find it useful.

All the information and the code come from Chapter 2 of the second edition of my book, Entity Framework Core in Action, which cover EF Core 5. In this book I build a book selling site, called Book App, where each book has two, many-to-many relationships:

  1. A direct many-to-many relationship to a Tag entity class (I refer to classes that EF Core maps to a database as entity classes). A Tag holds a category (for instance: Microsoft .NET or Web) which allows users to pick books by their topic.
  2. An indirect many-to-many relationship to Author entity class, which provides an ordered list of Author’s on the book, for instance: by Dino Esposito, Andrea Saltarello.

Here is an example of how it displays each book to the user – this is a fictitious book I used for many of my tests.

Overall summary and links to each section summary

For people who are in a hurry I have a ended each section with a summary. Here are the links to the summaries:

The overall summary is:

  • Direct many-to many relationships are super simple to configure and use, but by default you can’t access the linking table.
  • Indirect many-to many relationships takes more work to set up, but you can access the linking table. This allows you to put specific data in the linking table, such as an order in which you want to read them back.  

Here are the individual summaries (with links).

NOTE: All the code you see in this article comes the companion GitHub repo to my book Entity Framework Core in Action. Here is link to the directory with the entity classes are in, and many of code examples comes from the Ch03_ManyToManyUpdate unit test class and Ch03_ManyToManyCreate.

Setting the scene – the database and the query

Let’s start by seeing the finished database and how the query works. You can skip this, but maybe having an overall view of what is going on will help you when you are looking at the detailed part you are looking at the specific part you are interested in. Let’s start with the database.

This shows the two many-to-many – both have a linking table, but the direct many-to-many has its linking table created by EF Core.

Next, let’s see the many-to-many queries and how they relate to the book display in the figure below.

You can see that the Book’s Authors (top left) needs to be ordered – that Order property (a byte) is in the linking entity class. But for the Book’s Tags (bottom left), which don’t have an order, the query is much simpler to write because EF Core will automatically add the extra SQL needed to use the hidden linking table.

Now we get into the detail of setting up and using both of these types of many-to-many relationships.

Direct many-to-many setup – normal setup.

The setting up of the direct many-to-many relationship is done automatically (this is known as By Convention configuration in EF Core).  And when you create your database via EF Core, then it will add the linking table for you.

This is super simple to do – so much easier than the indirect many-to-many. But if you want to add extra data in the linking table, say for ordering or filtering, then you either alter the direct many-to-many or use the indirect many-to-many approach.

NOTE: The direct many-to-many relationship is only automatically configured if you have a collection navigational property on both ends. If you only want a navigational property on one end, then you will need to use the Fluent API configure (see next section), for instance …HasMany(x => x.Tags) .WithMany() where the Tags has no navigational property back to the Books.

Direct many-to-many setup: When you want to define the linking table

You can define an entity class and configure the linking table, but I will say that if you are going to do that you might as well use the indirect approach as I think it’s easier to set up and use.

Typically, you would only define the linking table if you wanted to add extra data. There are two steps in this process:

1. Creating a class to map to the linking table

Your entity class must have the two primary/foreign key from each ends of the many-to-many link, in this case it’s the BookId and the TagId. The code below defines the minimum properties to be the linking table – can add extra properties as normal, but I leave that you to do that.

public class BookTag
{
    public int BookId { get; set; }

    [Required]
    [MaxLength(40)]
    public string TagId { get; set; }

    //You can add extra properties here

    //relationships

    public Book Book { get; private set; }
    public Tag Tag { get; private set; }
} 

You could add properties such as the Order property needed for the Author ordering, or maybe a property to use for soft delete. That’s up to you and doesn’t affect the configuration step that comes next.

2. Configuring the linking table in the OnModelCreating method

Now you have to configure the many-to-many linking class/table with the UsingEntity method in the OnModelCreating method in your application’s DbContext, as shown in the code below.

public class EfCoreContext : DbContext
{
    //Other code left out to focus on many-to-many
 
    protected override OnModelCreating(ModelBuilder modelBuilder) 
    {
        //Other configuration left out to focus on many-to-many
 
        modelBuilder.Entity<Book>().HasMany(x => x.Tags)
                .WithMany(x => x.Books)
                .UsingEntity<BookTag>(
                    x => x.HasOne(x => x.Tag)
                    .WithMany().HasForeignKey(x => x.TagId),
                    x => x.HasOne(x => x.Book)
                   .WithMany().HasForeignKey(x => x.BookId));
    }
}

You can see the EF Core document on this here.

NOTE: I really recommend an excellent video produced by the EF Core team which has a long section on the new, direct many-to-many, including how to configure it to include extra data.

Direct many-to-many usage – querying

Querying the direct many-to-many relationships is quite normal. Here are some queries

  • Load all the Books with their Tags
    var books = context.Books.Include(b => b.Tags).ToList()
  • Get all the books with the TagId (which holds the category name)
    var books = context.Books.Tags.Select(t => t.TagId).ToList()

EF Core will detect that your query is using a direct many-to-many relationship and add the extra SQL to use the hidden linking table to get the correct entity instances on the other end of the many-to-many relationship.

Direct many-to-many usage: Add a new link

To add another many-to-many link to an existing entity class is easy – you just add the existing entry into the direct many-to-many navigational collection property. The code below shows how to add an existing Tag to a book that already had one Tag already.

var book = context.Books
    .Include(p => p.Tags)
    .Single(p => p.Title == "Quantum Networking"); 

var existingTag = context.Tags         
    .Single(p => p.TagId == "Editor's Choice");

book.Tags.Add(existingTag);
context.SaveChanges();

When you add the existing Tag into the Tags collection EF Core works out you want a linking entry created between the Book and the Tag. It then creates that new link.

A few things to say about this:

  • You should load the existing Tags using the Include method, otherwise you could lose any existing links to Tags.
  • You MUST load the existing Tag from the database to add to the Tags navigational collection. If you simply created a new Tag, then EF Core will add that new Tag to the database.

ADVANCE NOTES on navigational collection properties

Point 1: Let me explain why I say “You should load the existing Tags…” above. There are two situations:

  • If you add an empty navigational collection on the initialization of the class, then you don’t have add the Include method, as an Add will work (but I don’t recommend this – see below).
  • If your navigational collection is null after construction, then you MUST load the navigational collection, otherwise your code will fail.

Overall, I recommend loading the navigational collection using the Include method even if you have navigational collection has been set to an empty collection because the entity doesn’t match the database, which I try not to do as a future refactor might assume it did match the database.

Point 2: If you are adding a new entry (or removing an existing linking relationship) in a collection with LOTs of items in the collection, then you might have a performance issue with using an Include. In this case you can create (or delete for remove link – see below) the linking table entry. For a direct many-to-many relationship, you would need to create a property bag of the right form to add.

NOTE These ADVANCE NOTES also apply to the adding a new indirect many-to-many link.

Direct many-to-many usage: Remove a link

Removing a link to an entity that is already in the navigation property collection you simply remove that entity instance from the collection. The code below shows removing an existing Tag using the Remove method.

var book = context.Books
    .Include(p => p.Tags)
    .First();

var tagToRemove = book.Tags
    .Single(x => x.TagId == "Editor's Choice");
book.Tags.Remove(tagToRemove);
context.SaveChanges();

This just like the adding of a link, but in this case EF Core works out you what linking entity that needs to be deleted to remove this relationship.

Direct many-to-many usage: Create Book with tags

To add tags when you first create a book you just add each tag to the Tags collection. The code below adds two existing Tags to a new book (note that I haven’t set up the Author – see this part for how you do that).

var existingTag1 = context.Tags.Single(t => t.TagId == "Tag1");
var existingTag2 = context.Tags.Single(t => t.TagId == "Tag2");
var newBook = new Book()
{
    Title = "My Book",
    //... other property settings left out
    
    //Set your Tags property to an empty collection
    Tags = new List<Tag>()
};
newBook.Tags.Add(existingTag1);
newBook.Tags.Add(existingTag2);
context.Add(newBook);
context.SaveChanges();

Indirect many-to-many setup – configuring the linking table

An indirect many-to-many relationship takes a bit more work, but it does allow you to use extra data that you can put into the linking table. The figure below shows the three entity classes, Book, BookAuthor, and Author, with define the many-to-many relationship.

This is more complex because you need to define the linking entity class, BookAuthor, so that you can add extra data in the linking table and also excess that extra data when you query the data.

EF Core will automatically detect the relationships because of all the navigational properties. But the one thing it can’t automatically detect is the composite primary key in the BookAuthor entity class. This code below shows how to do that.

protected override void OnModelCreating(ModelBuilder modelBuilder)
{
    modelBuilder.Entity<BookAuthor>() 
        .HasKey(x => new {x.BookId, x.AuthorId});
} 

NOTE: Like the direct many-to-many configuration, if you leave out any of the four navigational properties, then it won’t set up that part of the many-to-many. You will then have to add Fluent API commands to set up the relationships.

Indirect many-to-many usage – querying

The indirect query is more complex, but that’s because you want to order the Author’s Names.

  • Load all the Books with their BookAuthor and Author entity classes
    var books = context.Books
         .Include(book => book.AuthorsLink).ThenInclude(ba => ba.Authors
         .ToList();
  • Load all the Books with their BookAuthor and Author entity classes, and make sure the Authors are in the right order
    var books = context.Books
         .Include(book => book.AuthorsLink.OrderBy(ba => ba.Order))
         .ThenInclude(ba => ba.Authors
         .ToList();
  • Get all the Books’ Title with the authors names ordered and then returned as a comma delimitated string
    var books = context.Books.Select(book => new
    {
        Title = book.Title,
         AuthorsString = string.Join(", ",  

    book.AuthorsLink.OrderBy(ba => ba.Order)
              .Select(ba => ba.Author.Name))
    }).ToList();

NOTE: ordering within the Include method is also a new feature in EF Core 5.

Indirect many-to-many usage – add a new link

To add a new many-to-many relationship link you need to add a new instance of the linking entity class, in our example that is a BookAuthor entity class, and set up the two relationships, in this example filling in the Book and Author singleton navigational properties. This is shown in the code below, where we set the Order to a value that adds the new Author on the end (the first Author has an Order of 0, the second Author is 1, and so on).

var existingBook = context.Books                           
    .Include(p => p.AuthorsLink)                   
    .Single(p => p.Title == "Quantum Networking"); 

var existingAuthor = context.Authors          
    .Single(p => p.Name == "Martin Fowler");

existingBook.AuthorsLink.Add(new BookAuthor  
{
    Book = existingBook,  
    Author = existingAuthor,  
    // We set the Order to add this new Author on the end
    Order = (byte) book.AuthorsLink.Count
});
context.SaveChanges();

A few things to say about this (the first two are the same as the direct many-to-many add):

  • You should load the Book’s AuthorsLink using the Include method, otherwise you will lose any existing links to Authors.
  • You MUST load the existing Author from the database to add to the BookAuthor linking entity. If you simply created a new Author, then EF Core will add that new Author to the database.
  • Technically you don’t need to set the BookAuthor’s Book navigational property because you added the new BookAuthor instance to the Book’s AuthorsLink, which also tells EF Core that this BookAuthor is linked to the Book. I put it in to make it clear what the Book navigational does.

Indirect many-to-many usage – removing a link

To remove a many-to-many link, you need to remove (delete) the linking entity. In this example I have a book with two Authors, and I remove the link to the last Author – see the code below.

var existingBook = context.Books
    .Include(book => book.AuthorsLink
        .OrderBy(x => x.Order))
    .Single(book => book.BookId == bookId);

var linkToRemove = existingBook.AuthorsLink.Last();
context.Remove(linkToRemove);
context.SaveChanges();

This works, but you have the problem of making sure the Order values are correct. In the example code I deleted the last BookAuthor linking entity so it wasn’t a problem, but if I deleted any BookAuthor other than the last I should recalculate the Order values for all the Authors, otherwise a later update might get the Order of the Authors wrong.

NOTE: You can remove the BookAuthor by removing it from the Book’s AuthorsLink collection, like you did for the direct many-to-many remove. Both approches work.

Indirect many-to-many usage – Create Book with Authors

To add Authors when you first create a book you need to add a BookAuthor linking class for each author in the book, setting the Order property to define the order that the Authors should be displayed in. The code below adds two existing Authors to a new book.

var existingAuthor1 = context.Authors
    .Single(a => a.Name == "Author1");
var existingAuthor2 = context.Authors
    .Single(a => a.Name == "Author2");
var newBook = new Book()
{
    Title = "My Book",
    //... other property settings left out

    //Set your AuthorsLink property to an empty collection
    AuthorsLink = new List<BookAuthor>()
};
newBook.AuthorsLink.Add(new BookAuthor
{
    Book = newBook,
    Author = existingAuthor1,
    Order = 0  //First author
});
newBook.AuthorsLink.Add(new BookAuthor
{
    Book = newBook,
    Author = existingAuthor2,
    Order = 1  //second author
});
context.Add(newBook);
context.SaveChanges();

Conclusion

So, since EF Core 5, you have two ways to set up a many-to-many – the original indirect approach (Book-BookAuthor-Author) and the new direct (Book-Tags) approach.  The new direct many-to-many is really easy to use, but as you have seen sometimes using the original indirect approach is the way to go when you want to do more than a simple link between to entity classes.

If you didn’t find this link before, I really recommend an excellent video produced by the EF Core team which has a long section on the new, direct many-to-many, including how to configure it to include extra data.

All the best with your EF Core coding and do have a look at my GitHub page to see the various libraries I have created to help build and test EF Core applications.

Happy coding.

Introducing the EfCore.SoftDeleteServices library to automate soft deletes

Last Updated: January 13, 2021 | Created: January 12, 2021

Following on from my articles “EF Core In depth – Soft deleting data with Global Query Filters” I have built a library, EfCore.SoftDeleteServices (referred to as the Soft Delete library from now on), which provides services that automates the methods your need, that are: soft delete, find soft deleted items, un-soft delete, and hard delete a soft deleted item. In addition, this library provides solutions to various soft delete issues such as handling multiple query filter and mimicking the cascade delete feature that SQL database provide. 

NOTE: The library is available on GitHub at https://github.com/JonPSmith/EfCore.SoftDeleteServices and on NuGet as EfCore.SoftDeleteServices.

Also, Readers of my first article that starred the project should know I had a Git issue (my fault) and I renamed the first GitHub version to EfCore.SoftDeleteServices-Old and restarted the project. Please link to the new repo to make sure you are kept up to date.

TL;DR – summary

  • Soft delete is the term uses when you “hide” a row in your database instead of deleting a row. You can do implement this using EF Core Query Filter feature. See this link for more explanation.
  • The EfCore.SoftDeleteServices solves a lot of issues when implementing a soft delete feature.
    • It provides code that can automatically configure the Query Filters for your soft delete entities.
    • It is very configurable, with you deciding where you what to put the soft delete value – in a bool property, a bit in a [Flags] enum, a shadow property, a Domain-Driven Design with methods.
    • This library can handle Query Filter contains other filters, like multi-tenant control. It makes sure the other filters are still applied, e.g. that means you are never in a situation where the multi-tenant filter isn’t used. That’s a very good security feature!
    • It has a cascade soft delete feature mimic what a normal (hard) delete does. This is useful in places when another part of your code accesses the dependent relationships of an entity that was soft deleted. That stops incorrect results – see this section.
    • It provides a method to register your configuration to DI and will automatically registers the right version of the Soft Delete service for you to use.
  • The library has documentation and is available on NuGet at EfCore.SoftDeleteServices.

Setting the scene – what are the issues around implementing soft delete?

If you want to soft delete an entity instance (I use the term entity instance or entity class to refer to a class that has been mapped by EF Core to a database) by using EF Core’s Query Filter feature, then you need to do three things:

  1. Add a boolean property, say SoftDeleted, to your entity class.
  2. Configure a Query Filter on that entity class using the SoftDeleted property
  3. Build code to set/reset the SoftDeleted property
  4. Build code to find the soft deleted entities using the IgnoreQueryFilters method

NOTE: I show these four stages in this section of my “EF Core In depth – Soft deleting data with Global Query Filters” article.

None of these steps are hard to do, but if we are really trying to mimic the way that the database deletes things, then you do need to be a bit more careful. I covered these issues, with some solutions in the previous article, but here is the list:

  • If your Query Filter contains other filters, like multi-tenant control, then things get more complex (see this explanation).
  • It’s not a good idea to soft delete a one-to-one relationship, because EF Core will throw errors if you try to add a new version (see this explanation).
  • The basic soft delete doesn’t mimic what a normal (hard) delete does – a hard delete would, by default, delete any dependant rows too. This I solve by the cascade soft delete part of the Soft Delete library.

The EfCore.SoftDeleteServices library is designed to overcome all of these issues and gives you a few more options to. In the next section I will describe a simple example of using this service

An example of using the EfCore.SoftDeleteServices library

Let’s start with the most used feature – soft deleting a single entity. The starting point is to create an interface and then adding that interface to the entity classes which you want to soft delete. In this example I am going to add a boolean SoftDeleted property to the Book entity class.

1. Using an interface to define what entities you want to soft delete

public interface ISingleSoftDelete
{
    bool SoftDeleted { get; set;  }
}
public class Book : ISingleSoftDelete
{
    public int Id { get; set; }

    public bool SoftDeleted { get; set; }
    //… rest of class left out
}

2. Setting up Query Filters to your entities

You need to add a Query Filter to every entity class. You can write the code for each entity class, which I show next, but you can automate added a Query Filter to every entity class, which I show after.

The manual setup goes in the OnModelCreating in your application’s DbContext – see the code below.

public class EfCoreContext : DbContext
{
    //Other code left out to focus on Soft delete
 
    protected override OnModelCreating(
        ModelBuilder modelBuilder) 
    {
        //Other configuration left out to focus on Soft delete
 
        modelBuilder.Entity<Book>()
           .HasQueryFilter(p => !p.SoftDeleted);
    }
}

But as I said I recommend automating your query filters by running code inside of your OnModleCreating method that looks at all the entity classes and adds a Query Filter to every entity class that has your soft delete interface, in this example ISingleSoftDelete. I have already described how to do this in this section of the article of soft deletion. You can also find some example code to do that in this directory of the Soft Delete’s GitHub repo

3. Configuring the soft delete library to your requirements

You need to create a configuration class which will tell the Soft Delete library what to do when you call one of its methods. The class below provides the definition for your entity classes with your interface (in this case ISingleSoftDelete), and how to get/set the soft delete property, plus other things like whether you want the single or cascade soft delete service and gets access to your application’s DbContext.

Your configuration must inherit either the SingleSoftDeleteConfiguration<TInterface> or the CascadeSoftDeleteConfiguration<TInterface> class – which one you use will define what service/features it provides.

NOTE: While I show the SoftDeleted property as a boolean type you could make it part of say a flag Enum. The only rule is you can get and set the property using a true/false value.

4. Setting up the Soft Delete services

To use the Soft Delete library, you need to get an instance of its service. You can create an instance manually (I use that in unit tests), but many applications now use dependency injection (DI), such as ASP.Net Core. The Soft Delete library provides an extension method called RegisterSoftDelServicesAndYourConfigurations, which will find and register all of your configuration classes and also registers the correct soft delete service for each configuration. The code below shows an example of calling this method inside ASP.Net Core’s startup method.

public void ConfigureServices(IServiceCollection services)
{
    //other setup code left out
    var softLogs = services
       .RegisterSoftDelServicesAndYourConfigurations(
           Assembly.GetAssembly(typeof(ConfigSoftDeleted))
        );
}

This will scan the assembly which has the ConfigSoftDeleted in and register all the configuration classes it finds there, and also registers the correct versions of the single or cascade services. In this example you would have three services configured

  • ConfigSoftDeleted as SingleSoftDeleteConfiguration<ISingleSoftDelete>
  • SingleSoftDeleteService<ISingleSoftDelete>
  • SingleSoftDeleteServiceAsync<ISingleSoftDelete>

A few features here:

  • You can provide multiple assemblies to scan.
  • If you don’t provide any assemblies, then it scans the assembly that called it
  • The method outputs a series of logs (see var softLogs in the code) when it finds/registers services. This can be useful for debugging if your soft delete methods don’t work. The listing below shows the output for my use of this library in my Book App.
No assemblies provided so only scanning the calling assembly 'BookApp.UI'
Starting scanning assembly BookApp.UI for your soft delete configurations.
Registered your configuration class ConfigSoftDelete as SingleSoftDeleteConfiguration<ISoftDelete>
SoftDeleteServices registered as SingleSoftDeleteService<ISoftDelete>
SoftDeleteServicesAsync registered as SingleSoftDeleteServiceAsync<ISoftDelete>

5. Calling the soft delete library’s methods

Having registered your configuration(s) you are now ready to use the soft delete methods. In this example I have taken an example from a ASP.NET Core application I build for my book. This code allows users to a) soft delete a Book, b) find all the soft deleted Books, and c) undelete a Book.

NOTE: You can access the ASP.NET Core’s Admin Controller via this link. You can also run this application by cloning the https://github.com/JonPSmith/EfCoreinAction-SecondEdition GitHub repo and selecting branch Part3.

public async Task<IActionResult> SoftDelete(int id, [FromServices] 
    SingleSoftDeleteServiceAsync<ISingleSoftDelete> service)
{
    var status = await service.SetSoftDeleteViaKeysAsync<Book>(id);

    return View("BookUpdated", new BookUpdatedDto(
        status.IsValid ? status.Message : status.GetAllErrors(),
        _backToDisplayController));
}

public async Task<IActionResult> ListSoftDeleted([FromServices] 
    SingleSoftDeleteServiceAsync<ISingleSoftDelete> service)
{
    var softDeletedBooks = await service.GetSoftDeletedEntries<Book>()
        .Select(x => new SimpleBookList{
              BookId = x.BookId, 
              LastUpdatedUtc = x.LastUpdatedUtc, 
              Title = x.Title})
        .ToListAsync();

    return View(softDeletedBooks);
}

public async Task<IActionResult> UnSoftDelete(int id, [FromServices] 
     SingleSoftDeleteServiceAsync<ISingleSoftDelete> service)
{
    var status = await service.ResetSoftDeleteViaKeysAsync<Book>(id);

    return View("BookUpdated", new BookUpdatedDto(
        status.IsValid ? status.Message : status.GetAllErrors(),
        _backToDisplayController));
}

The other feature I left out was the HardDeleteViaKeys method, which would hard delete (i.e., calls the EF Core Remove method) the found entity instance, but only if it had already been soft deleted.

NOTE: As well as the …ViaKeys methods there are the same methods that work on an entity instance.

Soft Delete library easily implement this example, but coding this yourself isn’t hard. So, let’s look at two, more complex examples that brings out some extra feature in the Soft Delete library. They are:

  • Handling Query Filters with multiple filter parts
  • Using cascade soft delete to ‘hide’ related information

Handling Query Filters with multiple filter parts

The EF Core documentation in Query Filters gives two main usages for Query Filters: soft delete and multi-tenancy filtering. One of my client’s application needed BOTH of these at the same time, which is doable but was a bit complex. While the soft delete filter is very important it’s also critical that at the multi-tenant part isn’t forgotten when using IgnoreQueryFilters method to access the soft deleted entities.   

One of the reasons for building this library was to handle applications where you want soft delete and multi-tenancy filtering. And the solution only needs you to add one line to the Soft Delete configuration – see lines 12 and 13 in the code below.

public class ConfigSoftDeleteWithUserId : 
    SingleSoftDeleteConfiguration<ISingleSoftDelete>
{
    public ConfigSoftDeleteWithUserId(
        SingleSoftDelDbContext context)
        : base(context)
    {
        GetSoftDeleteValue = entity => 
             entity.SoftDeleted;
        SetSoftDeleteValue = (entity, value) => 
             entity.SoftDeleted = value;
        OtherFilters.Add(typeof(IUserId), entity => 
             ((IUserId)entity).UserId == context.UserId);
    }
}

The OtherFilters.Add method allows you to define one or more extra filter parts, and when it filters for the GetSoftDeletedEntries method, or the Reset/HardDelete methods it makes sure these ‘Other Filters’ are applied (if needed).

To test this approach, I use my standard example of an application that sells books, where I want to soft delete a Book entity class (which has no multi-tenant part), and the Order entity class which has a multi-tenant part, so orders can only be seen by the user. This means the filter for finding each of the soft deleted entities are different.

Find soft deleted Book entity type, which doesn’t have the IUserId interface

context.Books.IgnoreQueryFilters.Where(b => b.SoftDeleted)

Find soft deleted on a Order entity type, which has the IUserId interface

context.Orders.IgnoreQueryFilters.Where(o => o.SoftDeleted && o.UserId = = context.UserId)

This is automatically done inside the Soft Delete library by dynamically building an expression tree. So the complex part is done inside the library and all you need to do is cut/paste the filter part and call the OtherFilters.Add method inside the configuration class.

Using cascade soft delete to also soft delete dependent relationships

One (small) problem with the single soft delete is it doesn’t work the same way as a normal (hard) delete. A hard delete would delete the entity in the database, and normally the database would also delete any relationship that can’t exist without that first entity (called a dependent relationships). For instance, if you hard deleted a Book entity that had some Reviews, then the database’s constraints would cascade delete all its Reviews linked to that Book. It does this to keep the referential integrity of the database, otherwise the foreign key in the Review table would be incorrect.

Most of the time the fact that a soft delete doesn’t also soft delete the dependent relationships doesn’t matter. For instance, not soft deleting the Reviews when you soft delete a Book most likely doesn’t matter as no one can see the Reviews because the Book isn’t visible. But sometimes it does matter, which is why I looked at what I would have to do to mimic the databases cascade deletes but using soft deletes. It turns out to be much more complex than I thought, but the soft delete library contains my implementation of cascade soft deleting.

NOTE: In the previous article I go through the various options you have when soft deleting an entity with dependant relationships – see this link.

While the single soft delete is useful everywhere, the cascade soft delete approach is only useful in specific situations. One that I came across was a company that did complex bespoke buildings. The process required created detailed quotes for a job which uses a hierarchical structure (shown as the “Quote View” in the diagram). Some quotes were accepted, and some were rejected, but they needed to keep the rejected quotes as a history of the project.

At the same time, they wanted to know if their warehouse had enough stock to build the quotes have sent out (shown as the “Stock Check View” in the diagram). Quote 456-1 was rejected which means it was cascade soft deleted, which soft deletes all the LineItems for Quote 456-1 as well. This means that when the Stock Check code is run it wouldn’t see the LineItems from Quote 456-1 so the Stock Check gives the correct value of the valid Quotes.

Using cascade soft delete makes the code for the Stock Check View much simpler, because the cascade soft delete of a quote also soft deletes its LineItems. The code below creates a Dictionary whose Key are the ProductSku, with the Value being how many are needed.

var requiredProducts = context.Set<LineItem>().ToList()
    .GroupBy(x => x.ProductSku, y => y.NumProduct)
    .ToDictionary(x => x.Key, y => y.Sum());

Running this code against the three quotes in the diagram means that only the dark green LineItems – the cascade soft deleted LineItems (shown in light green and a red triangle in it) aren’t included, which is what is needed in this case.

Solving the problem of cascade soft un-delete

There is one main difference between soft delete and the database delete – you can get back the soft delete data! That’s what we want, but it does cause a problem when using cascade soft delete/un-delete in that you might have already cascade soft deleted some relationships deeper down the in relationships. When you cascade soft un-delete you want the previous cascade soft delete of the deeper relationships to stay as they were.

The solution is to use a delete level number instead of a boolean. I cover this in more detail in the first article and I recommend you read this part of the article, but here is a single diagram that shows how the Soft Delete library’s cascade soft un-delete can return a entity and its dependent relationships back to the point where the last cascade soft was applied – in this case the LineItems in the red area will still be soft deleted.

Using the cascade soft delete methods

Using the cascade soft delete versions of the library requires you to:

  1. Set up an interface which adds a property of type byte to take the delete level number.
  2. Inherit the CascadeSoftDeleteConfiguration<TInterface> class in your configuration class. The code below shows you an example.
public class ConfigCascadeDelete : 
    CascadeSoftDeleteConfiguration<ICascadeSoftDelete>
{

    public ConfigCascadeDelete(
        CascadeSoftDelDbContext context)
        : base(context)
    {
        GetSoftDeleteValue = entity => 
            entity.SoftDeleteLevel;
        SetSoftDeleteValue = (entity, value) =>
            entity.SoftDeleteLevel = value;
    }
}

There are the same methods as the single soft delete methods, but they contain the word “Cascade”, for instance SetSoftDeleteViaKeys becomes SetCascadeSoftDeleteViaKeys and so on. All the same features are there, such as handler multiple filters.

Conclusion

I have now released the EfCore.SoftDeleteServices and the library is available on GitHub at https://github.com/JonPSmith/EfCore.SoftDeleteServices and on NuGet as EfCore.SoftDeleteServices. It was quite a bit of work, but I’m pleased with the final library. I have already put it to work in my BookApp.UI example application.

My experience on working on client projects says that soft delete is a “must have” feature. Mainly because users sometime delete something they didn’t mean to do. Often the soft delete is shown to users as “delete” even though it’s a soft delete with only the admin having the ability to un-soft delete or hard delete.

Let me know how you get on with the library!

Happy coding.

EF Core In depth – Tips and techniques for configuring EF Core

Last Updated: October 27, 2020 | Created: July 17, 2020

This article is being more efficient at configuring your EF Core DbContext that runs fast and safe. As a working freelance developer, I’m always looking for ways to make me a more efficient/faster developer. While configuring a DbContext is really important there can be a lot of configuration code, but over the years I have found ways to minimise or automate much of the EF Core configurations. This article pulls together lots of configuration approaches I have learnt working with EF Core, and EF6 before that.

I do assume you know EF Core, but I start with a look at EF Core’s configuration of your application’s DbContext to make sure we have the basics before I dig into the various tips and techniques to make you faster and safer.

This article is part of a “EF Core In depth” series. Here is the current list of articles in this series:

Other older articles in this series are

This “EF Core In depth” series is inspired by what I found while updating my book “Entity Framework Core in Action” to cover EF Core 5. I am also added a LOT of new content from my experiences of working with EF Core on client applications over the last 2½ years.

NOTE: There is a GitHub repo at https://github.com/JonPSmith/EfCoreinAction-SecondEdition/tree/Part2 that contains all the code used in this article.

TL;DR – summary

  • EF Core builds a Model of the database based on the DbSet<T> classes and various configuration methods. It does NOT look at the actual database.
  • EF Core uses three approaches to configure your application’s DbContext
    • By Convention, which applied a set of rules to the properties types/names to work out a default configuration.
    • Attributes: It will look for certain annotations, like [MaxLength(123)] to add more configurations.
    • Fluent API: Finally, it runs OnModelCreating method in the application’s DbContext where you can place Fluent API commands.
  • Learning/following EF Core’s By Convention rules will save you a LOT of time and code. It is especially good at configuring one-to-many relationships.
  • If you want your database to be quick it is worth defining the SQL type a bit more tightly for certain NET types, like string, DateTime, decimal.
  • These are some more advanced entity types that are useful. I talk about owned types, Table-per-Hierarchy and table splitting.
  • When your application gets big your configuration can be split into per-class configurations, which makes it easier to find/refactor.
  • There is a really helpful technique that can automate some of the configuring. It allows you to define you own By Convention and have it applied to all classes/properties you have defined.

Setting the scene – what is happening when you configure your DbContext

To use EF Core you must create a class that inherits EF Core’s DbContext (I refer to this as your application’s DbContext). In this class you add DbSet<T> properties that set up the mapping between your classes (I refer to these as entity classes) and the database. The following listing is a very basic application’s DbContext without any extra configuration.

public class EfCoreContext : DbContext 
{
    public EfCoreContext(DbContextOptions<EfCoreContext> options)                    
        : base(options) {}                                         

    public DbSet<Book> Books { get; set; }                      
    public DbSet<Author> Authors { get; set; }                  
    public DbSet<PriceOffer> PriceOffers { get; set; }          
}                 

When I talk about “configuring EF Core”, or “configure your DbContext” I’m talking about a process the EF Core does on the first use of your application’s DbContext. At that point is creates a Model of the database you plan to access based on your entity classes mapped to the database and any EF Core configuration commands you have provided.

Just to be clear, it never looks at the actual database to build this Model; it only uses the entity classes and any EF Core configuration commands you have added. How EF Core’s Model of the database and the actual database need to match otherwise your application will fail when it tried to access the database.

NOTE: I cover the whole area of how to make changes to a database in the two articles Handling Entity Framework Core migrations: creating a migration – Part 1 and Handling Entity Framework Core migrations: applying a migration – Part 2

The following figure shows the process that EF Core goes through the first time you use your application’s DbContext (later instances of your DbContext use a cached version of the created Model).

EF Core uses three ways to pick up configure information

  1. By Convention: When you follow simple rules on property types and names, EF Core will autoconfigure many of the software and database features. For instance
    1. A property of NET type string will, by default, map to SQL NVARCHAR(max) NULL
    1. A property with the name Id or <ClassName>Id (e.g. BookId) will be the primary key for this entity class.
  2. Data Annotations: A range of .NET attributes, known as Data Annotations, can be added to entity classes and/or properties to provide extra configuration information. For instance
    1. Adding the attribute [MaxLength(100)] on string property will change the SQL to  NVARCHAR(100) NULL
    1. Adding the attribute [Required] on string property will change the SQL to NVARCHAR(max) NOT NULL.
  3. Fluent API: EF Core has a method called OnModelCreating that’s run when the EF context is first used. You can override this method and add commands, known as the Fluent API. For instance
    1. The command modelBuilder.Entity<Book>().Property(p => p.Price).HasIndex() would add a non-unique index to the Price column in the table mapped to the Book entity class.
    1. The command modelBuilder.Entity<Book>().Property(p => p.PublishedDate).HasColumnType(“date”) would change the SQL type from DATETIME2, which has a resolution of 100ns, to the much smaller SQL DATE type that is accurate to the one day.

Read on for tips on how to use these three approaches to a) write the minimum of configuration code and b) get a good database design.

Tip: Let EF Core do most of the configuring using By Convention rules

Most of you will already using the By Convention rules to set up the column names and types. If you are control over the database design, known as its schema, i.e. you can use whatever column names that suit you, which will save you from writing a lot of boring configuration code.

By when it comes to relationships some developers seem to what to define every relationship. When I first started using EF6 I did just that, but ended up with a lot of code! Once I understood EF Core’s By Convention rules (writing the EF Core In Action book taught me the rules!), then I rarely defined a relationship unless I want to change the delete behaviour (I talk about delete behaviour later). The relationships rules are pretty simple

  1. Name your primary key as Id or <ClassName>Id (e.g. BookId).
  2. Use the <ClassName>Id name on your foreign key, because that works with both primary key formats, i.e. Id or <ClassName>Id
  3. Set up the property that links the two entity classes (known as navigational property) using the entity class type (the name doesn’t matter), e.g. ICollection<Review> Reviews { get; set; }

Here is a figure showing a relationship that EF Core’s By Convention will define automatically.

Of course, there are some exceptions where you would need Fluent API commands.

  • EF Core can only configure a one-to-one relationship By Convention if both ends of the have navigational properties, otherwise it will think it’s a one-to-many relationship. But one-to-one relationships are a lot less used than one-to-many and many-to-many relationships.
  • If you want to change the delete rules from the By Convention value; for instance, what happens to the Reviews when the Book is deleted – in this case the Reviews would be deleted too. If you didn’t want that to happen then you would have to define the relationship using Fluent API commands and add the OnDelete command.
  • If you have two navigational properties going to the same class, for instance BillingAddress and DeliveryAddress both pointing to the Address entity class, then you do need to configure that manually (but an Owned type would be better for that).
  • Some very advanced things like setting the constraint name need Fluent API

Overall you want to let EF Core configure as much as you can as its quick and easy. So, learn the rules and trust in EF Core (but unit tests are also good!)

Making your database more efficient

Its easy to create classes, but entity classes need a little more attention to make sure the database is as fast as it can be. This requires a bit more work on your part. Here are some things to consider

1. string type properties

By default, will set the SQL type to NVARCHAR(MAX) NULL works OK, but do you need space for a 1Gbyte Unicode character string? Here are some suggestions:

  • Set the size of the string using [MaxLength(123)] attribute. NVARCHAR(NNN) is slightly quicker than NVARCHAR(MAX) and NVARCHAR(NNN) . NOTE The [MaxLength(123)] is also useful for front-end checking that the input isn’t too long.
  • If you filter or sort on a string, then adding an SQL index is useful. Use Fluent API command HasIndex() or the new EF Core 5 (preview 6) [Index(nameof(Title)] attribute. NOTE an index has a limit of 900 bytes, so your NVARCHAR must be 450 or lower.
  • Some strings are 8-bit ASCII, like URLs, so why send/return the other bytes. Use Fluent API command IsUnicode(false), which will turn the SQL type from NVARCHAR to VARCHAR.
  • Try adding the [Required(AllowEmptyStrings = false)] attribute on strings you expect to contain a string. The [Required] part will change the SQL type from NVARCHAR(MAX) NULL to NVARCHAR(MAX) NOT NULL (the AllowEmptyStrings = false part doesn’t affect the database; it is only used in any NET validations).

2. DateTime type properties

By default, NET’s DateTime type is saved as SQL DATETIME2, which has a resolution of 100ns and take up 7 bytes. In some cases that is great, but SQL DATE type is only 3 bytes. As well as saving bytes a sort or filter of a DATE type is going to be much quicker sort/filter than on a DATETIME2 type.

NOTE: If you save a DateTime that is using DateTimeKind.Utc, then you should know that the DateTimeKind of a DateTime is not preserved in the database. That matters if your front-end is going to send the data using JSON, as the JSON datatime string won’t end with a “Z” and your front-end might get the date offset right. You can fix this using EF Core’s ValueConverters – (add a comment to this article if you want to know how to do that).

3. decimal type properties

By default, a NET decimal type is saved as DECIMAL(18,2), which is SQL Servers default, which means it has 16 digits before the decimal point and 2 after the decimal point and takes up 9 bytes. If your dealing with money that might be too big, and DECIMAL(9,2) would work and that’s only 5 bytes.

On the other hand, if you’re dealing with percent, then having a precision of 2 decimal places might not be enough, and 16 digits before the decimal point is too much.

In both cases its worth changing the default precision (i.e. number of digits stored) and scale (i.e. number of digits after the decimal point). You can do that via the [Column(Datatype=”decimal(9,2)”)] or the Fluent API command HasColumnType(”decimal(9,2)”) command. But in EF Core 5 there is a really nice Fluent API called HasPrecision(9,2), which is easier.

4. Avoid expression body properties with EF Core

In a normal class having a property where it has code (referred to as expression body definition) as shown below is the right thing to do.

public class MyClass
{
    public string FirstName { get; set; }
    public string LastName { get; set; }

    public string FullName => $"{FirstName} {LastName}";
}

The problem comes when you want to sort/filter on the FullName – EF Core will throw an exception because there is no FullName column to sort/filter on. So you need to add an actual FullName property that will be mapped to the database, and you either set the properties via a constructor, or use EF Core’s backing fields to capture a software change to the FirstName/LastName and set the FullName

public class MyClassImproved
{
    private string _firstName;
    private string _lastName;

    public string FirstName
    {
        get => _firstName;
        set
        {
            _firstName = value;
            FullName = $"{FirstName} {LastName}";
        }
    }

    public string LastName
    {
        get => _lastName;
        set
        {
            _lastName = value;
            FullName = $"{FirstName} {LastName}";
        }
    }

    public string FullName { get; set; }
}

NOTE: Another option in EF Core 5 (preview 5) is stored (persisted) computed columns which allow you to have a FullName column that runs the SQL command FirstName + ‘ ‘ + LastName whenever the row is created or updates. Its efficient and SQL Server allows indexes on persisted computed columns too.

Let’s take about some more advanced entity types

Using normal entity classes with links other entity classes works, but there are some variants of classes that can make your life easier and can improve performance. Here are some specific EF Core class types

  • Owned types – useful for common data that used in lots of places, e.g. Address
  • Table per hierarchy (TPH)—This maps a series of inherited classes to one table; for instance, classes called Dog, Cat, and Rabbit that inherit from the Animal class
  • Table splitting – Lets you map multiple classes to a table. Useful if you want a Summary part and a Detailed part.

I have used Owned types a lot and its great for keeping a specific group of data together. I have also used TPH quite a bit on client systems where there is common data with a few differences – really worth looking at. I haven’t used table spitting much because I normally use Select queries to pick the exact properties/columns I want anyway.

I’m only going to cover the Owned types because this article is pretty long already, and I still want to show more things.

Owned entity types

Owned entity types are classes that you can add to an entity class and the data in owned types will be combined into the entity class’s table. To make this more concrete, think about an address. An address is a group of Street, City, State etc. properties that, on their own, aren’t that useful as they need to link to a company, a user, a delivery location and so on.  

The owned type class doesn’t have its own primary key, so doesn’t have an identity of its own but relies on the entity class that “owns” it for its identity. In DDD terms, owned types are known as value objects. This also means you can use owned type multiple times in an entity class – see the example OrderInfo class with two addresses in it.

public class OrderInfo
{
    public int OrderInfoId { get; set; }
    public string OrderNumber { get; set; }

    public Address BillingAddress { get; set; } 
    public Address DeliveryAddress { get; set; }
}

The address class must be marked as an Owned type either by the [Owned] attribute or via Fluent API. The code below uses the [Owned] attribute (highlighed)

[Owned]                                        
public class Address                           
{
    public string NumberAndStreet { get; set; }
    public string City { get; set; }
    public string ZipPostCode { get; set; }                       
    public string CountryCodeIso2 { get; set; }
}

Now when you look at the SQL table generated by ED Core it looks like this

CREATE TABLE [Orders] (
    [OrderInfoId] int NOT NULL IDENTITY,
    [OrderNumber] nvarchar(max) NULL,
    [BillingAddress_City] nvarchar(max) NULL,
    [BillingAddress_NumberAndStreet] nvarchar(max) NULL,
    [BillingAddress_ZipPostCode] nvarchar(max) NULL,
    [BillingAddress_CountryCodeIso2] [nvarchar](2) NULL
    [DeliveryAddress_City] nvarchar(max) NULL,
    [DeliveryAddress_CountryCodeIso2] nvarchar(max) NULL,
    [DeliveryAddress_NumberAndStreet] nvarchar(max) NULL,
    [DeliveryAddress_CountryCodeIso2] [nvarchar](2) NULL, 
    CONSTRAINT [PK_Orders] PRIMARY KEY ([OrderInfoId])
);

As you can see the two Address class data, BillingAddress and DeliveryAddress,is added to the Orders table.

Few things to know about owned types:

  • A property with a type that is Owned Type in an entity can be null, in which case all the columns in the table are null.
  • If an Owned Type contains a non-nullable property it is still stored in a nullable column in the database. That’s done to handle an Owned Type property being null
  • Nullable Owned Type properties were added in EF Core 3, but the SQL command wasn’t ideal. This is fixed in EF Core 5.
  • You can map an Owned Type for a separate table – I haven’t described that.

Tip: How to organise your configuration code

The Fluent API go in the OnModelCreating method in your application’s DbContext. For small projects that works fine, but once you start to get more and more Fluent API configrations it can get messy and hard to find. One solution is using the IEntityTypeConfiguration<TEntity> type. This allows you to have configurations for each entity class that needs it – see code below

internal class BookConfig : IEntityTypeConfiguration<Book>
{
    public void Configure
        (EntityTypeBuilder<Book> entity)
    {
        entity.Property(p => p.PublishedOn)           
            .HasColumnType("date");    
        entity.HasIndex(x => x.PublishedOn);                         

        //HasPrecision is a EF Core 5 method
        entity.Property(p => p.Price).HasPrecision(9,2);                       

        entity.Property(x => x.ImageUrl).IsUnicode(false);                        

        entity.HasQueryFilter(p => !p.SoftDeleted);   

        //----------------------------
        //one-to-one with only one navigational property must be defined

        entity.HasOne(p => p.Promotion)               
            .WithOne()                                
            .HasForeignKey<PriceOffer>(p => p.BookId);
    }
}

You have a few options how to run these. You can call them inside your OnModelCreating method using the code below

protected override void OnModelCreating(ModelBuilder modelBuilder)
{
    modelBuilder.ApplyConfiguration(new BookConfig());
    modelBuilder.ApplyConfiguration(new BookAuthorConfig());
    //… and so on.
}

Or you can use the ApplyConfigurationsFromAssembly command to find and run all your IEntityTypeConfiguration<TEntity> classes. The code below assumes those classes are in the same project as your application’s DbContext

protected override void OnModelCreating(ModelBuilder modelBuilder)
{
    modelBuilder.ApplyConfigurationsFromAssembly(
          Assembly.GetExecutingAssembly());
}

Adding your configuration rules automatically

One of the big things I learnt is how to automatically apply Fluent API commands to certain classes/properties. For example I will show you is how to define your own By Convention rules, for instance any entity class property of type decimal and the parameter’s name contains the string “Price” sets the SQL type set to DECIMAL(9,2).

This relies on the modelBuilder.Model.GetEntityTypes() method available in the OnModelCreating method. This provides a collection of all the entity classes that EF Core has found at this stage, and within that you can gain access to the properties in each entity class.

The piece of code taken from an application’s DbContext contains two rules

  • Properties of type decimal, with “Price” in the parameter name is set to DECIMAL(9,2).
  • Properties of type string and the parameter name ends with “Url”, is set to VARCHAR
protected override void OnModelCreating(ModelBuilder modelBuilder)
{
    foreach (var entityType in modelBuilder.Model.GetEntityTypes()) 
    {
        foreach (var entityProperty in entityType.GetProperties())  
        {
            if (entityProperty.ClrType == typeof(decimal)           
                && entityProperty.Name.Contains("Price"))           
            {                                                       
                entityProperty.SetPrecision(9);                     
                entityProperty.SetScale(2);                         
            }                                                       

            if (entityProperty.ClrType == typeof(string)            
                && entityProperty.Name.EndsWith("Url"))             
            {                                                       
                entityProperty.SetIsUnicode(false);                  
            }                                                       
        }
    } 
}

Those are just two that I use, plus one to sort out DateTimeKind on DateTime properties whose parameter name ends in “Utc”. I also use Query Filter setup shown in the last article to set up all my Query Filters; that way I can’t forget to set up a Query Filter.

Conclusion

I have gone from explain how EF Core builds a Model of the database you want to access a using three different configuration approaches. Learning EF Core’s By Convention rules can cut down a LOT of configuration code, which saves you time and makes your configuration easier to understand.

I then talked about some of the NET types that might need to add some configuration code to make your database more efficient. I also touched on some of different types of class arrangement that EF Core can do, especially the Owned Type.

Finally, I cover different ways to configure your application’s DbContext, including a very helpful way to automate some of your configurations. I really like the automating configuration approach as it makes sure I haven’t missed anything, and it reduces the amount of configuration code I need to write.

Happy coding.

EF Core In depth – Soft deleting data with Global Query Filters

Last Updated: February 20, 2021 | Created: July 2, 2020

This article is about a way to seemingly delete data, but in fact EF Core hides it for you and you can get it back if you need to. This type of feature is known as soft delete and it has many good features, and the issues to be aware of too. In this article I use EF Core to implement the normal soft delete and a more powerful (and complex) cascade soft delete. Along the way I give you tips on how write reusable code to speed up your development of a soft delete solution.

I do assume you know EF Core, but I start with a look at using EF Core to make sure we have the basics of deleting and soft deleting covered before looking at solutions.

This article is part of a “EF Core In depth” series. Here is the current list of articles in this series:

Other older articles in this series are

This “EF Core In depth” series is inspired by what I found while updating my book “Entity Framework Core in Action” to cover EF Core 5. I am also added a LOT of new content from my experiences of working with EF Core on client applications over the last 2½ years.

NOTE: There is a GitHub repo at https://github.com/JonPSmith/EfCore.SoftDeleteServices that contains all the code used in this article.

UPDATE – new library

I have just released the library EfCore.SoftDeleteServices which I talk about in this article. See the new article Introducing the EfCore.SoftDeleteServices library to automate soft deletes to read how this library helps you implement soft delete in your applications.

TL;DR – summary

  • You can add a soft delete feature to your EF Core application using Global Query Filters (referred to as Query Filters for now on).
  • The main benefits of using soft delete in your application are inadvertent deletes can be restored and history is preserved.
  • There are three parts to adding the soft delete feature to your application
    • Add a new soft delete property to every entity class you want to soft delete.
    • Configure the Query Filters in your application’s DbContext
    • You create code to set and reset the soft delete property.
  • You can combine soft delete with other uses of Query Filters, like multi-tenant uses but you need to be more careful when you are looking for soft deleted entries.
  • Don’t soft delete a one-to-one entity class as it can cause problems.
  • For entity classes that has relationships you need to consider what should happen to the dependant relationships when the top entity class is soft deleted.
  • I introduce a way to implement a cascade soft delete that works for entities where you need its dependant relationships soft deleted too.

Setting the scene – why soft delete is such a good idea

When you hard delete (I use the term hard delete from now on, so its obvious what sort of delete I’m talking about), then it gone from your database. Also, hard deleting might also hard delete rows that rely on the row you just hard deleted (known as dependant relationships). And as the saying says “When it’s gone, then it’s gone” – no getting it back unless you have a backup.

But nowadays we are more used to “I deleted it, but I can get it back” – On windows it’s in the recycle bin, if you deleted some text in an editor you can get it back with ctrl-Z, and so on. Soft delete is EF Core’s version of windows recycle bin – the entity class (the term for classes mapped to the database via EF Core) is gone from normal usage, but you can get it back.

Two of my client’s applications used soft delete extensively. Any “delete” the normal user did set the soft delete flag, but an admin user could reset the soft delete flag to get the item back for the user. In fact, one of my clients used the terms “delete” for a soft delete and “destroy” for a hard delete. The other benefit of keeping soft-deleted data is history – you can see what changed in the past even its soft deleted. Most client’s keeps soft deleted data in the database for some time and only backup/remove that data many months (years?) later.

You can implement the soft delete feature using EF Core Query Filters. Query Filters are also used for multi-tenant systems, where each tenant’s data can only be accessed by users who belong to the same tenant. This means EF Core Query Filters are designed to be very secure when it comes to hiding things – in this case data that has been soft deleted.

I should also say there are some down sides of using soft delete. The main one is performance – an extra, hidden SQL WHERE clause is included in every query of entity classes using soft delete.

There is also a difference between how soft delete handles dependant relationships when compared with hard delete. By default, if you soft delete an entity class then its dependant relationships are NOT soft deleted, whereas a hard delete of an entity class would normally delete the dependant relationships. This means if I soft delete a Book entity class then the Book’s Reviews will still be visible, which might be a problem in some cases. At the end of this article I show you how to handle that and talk about a prototype library that can do cascade soft deletes.

Adding soft delete to your EF Core application

In this section I’m going to go through each for the steps to add soft delete to your application

  1. Add soft delete property to your entity classes that need soft delete
  2. Add code to your DbContext to apply a query filter to these entity classes
  3. How to set/reset Soft Delete

In the next sections I describe these stages in detail. I assume a typical EF Core class with normal read/write properties, but you can adapt it to other entity class styles, like Domain-Driven Design (DDD) styled entity classes.

1. Adding soft delete property to your entity classes

For the standard soft delete implementation, you need a boolean flag to control soft delete. For instance, here is a Book entity with a SoftDeleted property highlighted.

public class Book : ISoftDelete                   
{
    public int BookId { get; set; }
    public string Title { get; set; }
    //… other properties left out to focus on Soft delete

    public bool SoftDeleted { get; set; }
}

You can tell by its name, SoftDeleted, that if it is true, then its soft deleted. This means when you create a new entity it is not soft deleted.

You other thing I added was an ISoftDelete interface to the Book class (line 1). This interface says the class must have a public SoftDeleted property which can be read and written to. This interface is going to make it much easier to configure the delete query filters in your DbContext.

2. Configuring the soft delete query filters in your DbContext

You must tell EF Core which entity classes needs a query filter and provide a query which will be true if you want it to be seen. You can do this manually using the following code in your DbContext – see highlighted line in the following listing.

public class EfCoreContext : DbContext
{
    public EfCoreContext(DbContextOptions<EfCoreContext> option)                      
        : base(options)                                           
    {}
                        
    //Other code left out to focus on Soft delete

    protected override OnModelCreating(ModelBuilder modelBuilder) 
    {
        //Other configuration left out to focus on Soft delete

        modelBuilder.Entity<Book>().HasQueryFilter(p => !p.SoftDeleted);
    }
}

That’s fine but let me show you a way to automate adding query filters. This uses

  1. The modelBuilder.Model.GetEntityTypes() feature available in the OnModelCreating method
  2. A little bit of generic magic to create the correct query filter

Here are two part:

1. Automating the configuring of the soft delete query filters

The OnModelCreating method in your DbContext is where you can configure EF Core via what are known as Fluent API configuration commands – you saw that in the last listing. But there is also a way you can look at each entity class and decide if you want to configure it.

In the code below you can see the foreach loop that goes through each entity class in turn. You will see a test to see if the entity class implements the ISoftDelete interface and if it does it calls a extension method I created to configure a query filter with the correct soft delete filter.

protected override void OnModelCreating(ModelBuilder modelBuilder)
{
    //other manual configurations left out       

    foreach (var entityType in modelBuilder.Model.GetEntityTypes())
    {
        //other automated configurations left out
        if (typeof(ISoftDelete).IsAssignableFrom(entityType.ClrType))
        {
            entityType.AddSoftDeleteQueryFilter();      
        }    
    }
}

2. Creating the AddSoftDeleteQueryFilter extension method

There are many configurations you can apply directly to the type that the GetEntityTypes method returns but setting up the Query Filter needs a bit more work. That’s because LINQ query in the Query Filter needs the type of the entity class to create the correct LINQ expression. For this I created a small extension class that can dynamically create the correct LINQ expression to configure the Query Filter and adds a index to that property.

public static class SoftDeleteQueryExtension
{
    public static void AddSoftDeleteQueryFilter(
        this IMutableEntityType entityData)
    {
        var methodToCall = typeof(SoftDeleteQueryExtension)
            .GetMethod(nameof(GetSoftDeleteFilter),
                BindingFlags.NonPublic | BindingFlags.Static)
            .MakeGenericMethod(entityData.ClrType);
        var filter = methodToCall.Invoke(null, new object[] { });
        entityData.SetQueryFilter((LambdaExpression)filter);
        entityData.AddIndex(entityData.
             FindProperty(nameof(ISoftDelete.SoftDeleted)));
    }

    private static LambdaExpression GetSoftDeleteFilter<TEntity>()
        where TEntity : class, ISoftDelete
    {
        Expression<Func<TEntity, bool>> filter = x => !x.SoftDeleted;
        return filter;
    }
}

I really like this because I a) saves me time, and b) I can’t forget to configure a query filter.

3. How to set/reset Soft Delete

Setting the SoftDeleted property to true is easy – the user picks an entry and clicks “Soft Delete”, which send back the entities primary key. Then your code to implement that is.

var entity = context.Books.Single(x => x.BookId == id);
entity.SoftDeleted = true;
context.SaveChanges();

Resetting the SoftDeleted property is a little bit more complex. First you most likely want to show the user a list of JUST the soft deleted entities – think of it as showing the trash can/recycle bin for an individual entity class type, e.g. Book. To do this need to add the IgnoreQueryFilters method to your query which means you will get ALL the entities, ones that aren’t soft deleted and ones that are, but you then pick out the ones where the SoftDeleted property is true.

var softDelEntities = _context.Books.IgnoreQueryFilters()
    .Where(x => x.SoftDeleted)
    .ToList();

And when you get a request to reset the SoftDeleted property this typically contains the entity classes primary key. To load this entry you need include the IgnoreQueryFilters method in your query to get the entity class you want to reset.

var entity = context.Books.IgnoreQueryFilters()
     .Single(x => x.BookId == id);
entity.SoftDeleted = false;
context.SaveChanges();

Things to be aware of if you use Soft delete

First, I should say that Query Filters are very secure, by that I mean if the query filter returns false then that specific entity/row won’t be returned in a query, a Find, an Include of a relationship etc. You can get around it by using direct SQL, but other than that EF Core is going to hide things that you soft delete.

But there are a couple of things you do need to be aware of.

Watch out for mixing soft delete with other Query Filter usages

Query Filters are great for soft delete, but Query Filters are even better for controlling access to groups of data. For instance, say you wanted to build a web application that to provide a service, like payroll, to lots of companies. In that case you need make sure that company “A” couldn’t see company “B” data, and vis versa. This type of system is called a multi-tenant application, and Query Filters are a perfect fit for this.

NOTE: See my article Part 2: Handling data authorization in ASP.NET Core and Entity Framework Core for using query filters to control access to data.

The problem is you are only allowed one query filter per entity type, so if you want to use soft delete with a multi-tenant system then you must combine both parts to form the query filter – here is an example of what the query filter might look like

modelBuilder.Entity<MyEntity>()
    .HasQueryFilter(x => !x.SoftDeleted 
                       && x.TenantId == currentTenantId);

That work fine, but when you use the IgnoreQueryFilters method, say to reset a soft deleted flag, then it ignores the whole query filter, including the multi-tenant part. So, if you’re not careful you could show multi-tenant data too!

The answer is to build yourself an application-specific IgnoreSoftDeleteFilter method something like this.

public static IQueryable<TEntity> IgnoreSoftDeleteFilter<TEntity>(
    this IQueryable<TEntity> baseQuery, string currentTenantId)
    where TEntity : class, ITenantId
{
    return baseQuery.IgnoreQueryFilters()
        .Where(x => x.TenantId == currentTenantId)
}

This ignores all the filters and then add back the multi-tenant part of the filter. That will make it much easier to safely handle showing/resetting soft deleted entities

Don’t soft delete a one-to-one relationship

I was called in to help on a very interesting client system that used soft delete on every entity class. My client had found that you really shouldn’t soft delete a one-to-one relationship. The problem he found was if you soft delete a one-to-one relationship and try to add a replacement one-to-one entity, then it fails. That’s because a one-to-one relationship has a unique foreign key and that is already set by the soft deleted entity so, at the database level, you just can’t provide another one-to-one relationship because there is one already.

One-to-one relationships are rare, so it might not be a problem in your system. But if you really need to soft delete a one-to-one relationship, then I suggest turn it into a one-to-many relationship where you make sure only one of the entities has a soft delete turned off, which I cover in the next problem area.

Handling multiple versions where some are soft deleted

There are business cases where you might create an entity, then soft delete it, and then create a new version. For example, say you were creating invoice for order 1234, then you are told the order has been stopped, so you soft delete it (that way you keep the history). Then later someone else (who doesn’t know about the soft deleted version) is told to create an invoice for 1234. Now you have two versions of the invoice 1234. For something like an invoice that could cause a problem business-wise, especially if someone reset the soft deleted version.

You have a few ways to handle this:

  • Add a LastUpdated property of type DateTime to your invoice entity class and the latest, not soft-deleted, entry is the one to use.
  • Each new entry has a version number, so in our case the first invoice wold be 1234-1 and the section would be 1234-2. Then, like the LastUpdated version, the invoice with the highest version number, and is not soft deleted, is the one to use.
  • Make sure there is only one not soft-deleted version by using a unique filtered index. This works by creating a unique index for all entries that aren’t soft deleted, which means you would get an exception if you tried to reset a soft-deleted invoice but there was an existing non-soft deleted invoice already there. But at the same time, you could have lots of soft-deleted version for your history. Microsoft SQL Server RDBMS, PostgreSQL RDBMS, SQLite RDBMS have this feature (PostgreSQL and SQLite call it partial indexes) and I am told you can something like this in MySQL too. The code below is the SQL Server version of a filtered unique index.
CREATE UNIQUE INDEX UniqueInvoiceNotSoftDeleted  
ON [Invoices] (InvoiceNumber)  
WHERE SoftDeleted = 0  

NOTE: For handling the exception that would happen with the unique index issue see my article called “Entity Framework Core – validating data and catching SQL errors” which shows you how to convert a SQL exception into a user-friendly error string.

What about relationships?

Up to now we have been looking at soft deleting/resetting a single entity, but EF Core is all about relationships. So, what should I do about any relationships linked to the entity class that you just soft deleted? To help us, lets look at two different relationships that have different business needs.

Relationship example 1 – A Book with its Reviews

In my book “Entity Framework Core in Action” I build a super-simple book selling web site with books, author, reviews etc. And in that application, I can soft delete a Book. It turns out that once I delete the Book then there really isn’t another way to get to the Reviews. So, in this case I don’t have to worry about the Reviews of a soft deleted book.

But to make things interesting in chapter 5, which is about using EF Core with ASP.NET Core, I added a background task that counts the number of reviews. Here is the code I wrote to count the Reviews

var numReviews = await context.Set<Review>().CountAsync();

This, of course, gave the same count irrespective of whether the Book is soft deleted, which is different to what happens if I hard deleted the Book (because that would also delete the book’s Review). I cover how to get around this problem later.

Relationship example 2 – A Company with its Quotes

In this example I have many companies that I sell to and each Company has set of Quotes we sent to that company. This is the same one-to-many relationship that the Book/Reviews has, but in this case, we have a list of companies and AND a separate list of Quotes. So, if I soft delete a Company then all the Quotes attached to that company should be soft deleted too.

I have come up with three useful solutions to both soft delete relationships examples I have just described.

Solution 1 – do nothing because it doesn’t matter

Sometimes it doesn’t matter that you soft deleted something, and its relationships are still available. Until I added the background task that counts Reviews my application worked fine if I soft deleted a book.

Solution 2 – Use the Aggregates/Root approach

The solution to the background task Reviews count I used was to apply a Domain-Driven Design (DDD) approach called Aggregate. This says a that you get grouping of entities that work together, in this case the Book, Review, and the BookAuthor linking table to the Author. In a group like this there is a Root entity, in this case the Book.

What Eric Evans, who is the person that define DDD, says is you should always access the aggregates via the Root aggregate. There are lots of DDD reasons for saying that, but in this case, it also solves our soft delete issue, as if I only get the Reviews through the Book then when it is soft deleted then the Reviews count is gone. So, the code below is the replacement to go in background task Reviews count

var numReviews = await context.Books
                   .SelectMany(x => x.Reviews).CountAsync();

You could also do a version of the review count query to list the Quotes via the Company, but there is another option – mimicking the way that database handles cascade deletes, which I cover next.

Solution 3 – mimicking the way that cascade deletes works

Databases have a delete setting called CASCADE, and EF Core has two DeleteBehaviours, Cascade and ClientCascade. These behaviours causes the hard delete of a row to also hard delete any rows that rely on that row. For instance, in my book-selling application the Book is what is called the principal entity and the Review, and the BookAuthor linking table are dependant entities because they rely on the Book’s Primary key. So, if you hard delete a Book then all the Review, and BookAuthor rows link to that Book row are deleted too. And if those dependant entities had their own dependants, then they would be deleted too – the delete cascades down all the dependant entities.

So, if we duplicate that cascade delete down the dependant entities but setting the SoftDeleted property to true, then it would soft delete all the dependant too. That works, but it gets a bit more complex when you want to reset the soft delete. Read the next section for what you really need to do.

Building solution 3 – Cascade SoftDeleteService

I decided I wanted to write a service that would provide a cascade soft delete solution. Once I started to really build this, I found all sorts of interesting things to that I had to solve because we when we reset the soft delete we want the related entities to come it back to their original soft deleted state. I turns out that I bit more complex, so let’s first explore this problem I found with an example.

Going back to our Company/Quotes example let’s see what happens if we do cascade the setting of the SoftDeleted boolean down from the Company to the Quotes (hint – it doesn’t work in some scenarios). The starting point is we have a company called XYZ, which has two quotes XYZ-1 and XYZ-2. Then:

WhatCompanyQuotes
StartingXYZXYZ-1 XYZ-2
Soft delete the quote XYZ-1XYZXYZ-2
Soft delete Company XZ– none –– none –
Reset the soft delete on the company XYZXYZXYZ-1 (wrong!) XYZ-2

What has happened here is when I reset Company XYZ it also resets ALL the Quotes, and that’s not what the original state was. It turns out we need a byte, not a boolean so that we can know what to reset and what to keep still soft deleted.

What we need to do is have a soft delete level, where the level tells you how far down was this soft delete setting set. Using this we can work out whether we should reset the soft delete or not. This gets pretty complex, so I have a figure that shows how this works. Light coloured rectangle represent entities that are soft deleted, with the change from the last step shown in red.

So, you can handle cascade soft deletes/resets and it works really well. There are lots of little rules you cover in the code, like you can’t start a reset of an entity if its SoftDeleteLevel isn’t 1, because a higher-level entity soft deleted it, and I have tried to build in checks/workarounds to the issues.

I think this cascade soft delete approach is useful and I have built some prototype code to do this, but it’s going to take quite a bit more work to turn it into a NuGet library that can work with any system (here is my current list of things to do).

The soft delete library out now. Here are the some useful links

Conclusion

Well we have well and truly looked at soft delete and what it can (and cannot) do. As I said at the beginning, I have used soft delete on two of my client’s systems and it makes so much sense to me. The main benefits are inadvertent deletes can be restored and history is preserved. The main downside is the soft delete filter might slow queries down but adding an index on the soft deleted property will help.

I know from my experiences that soft delete works really well in business applications. I also know that cascade soft deletes would have helped in one of my client’s systems which had some hierarchical parts – deleting a higher level would then marked all child parts as soft deleted too which would make things faster when querying the data.

The EfCore.SoftDeleteSevices is out now. I had problems with the Git setup (my fault, not GitHub’s fault) so renamed the old repo to -Old and created a new EfCore.SoftDeleteServices repo. That means I lost all the stars people had added to say they wanted the library. Thanks to to starred the old repo you for your support.

Happy coding.