Evolving modular monoliths: 3. Passing data between bounded contexts

Last Updated: May 17, 2021 | Created: May 17, 2021

This article describes the different ways you can pass data between isolated sections of your code, known in DDD as bounded contexts. The first two articles used bounded contexts to modularize our monolith application so, when we implement the communication paths between bounded contexts, we don’t want to compromise this modularization.

DDD has lots to say about design view of communication between bounded contexts, and .NET provides some tools to implement these communication channels in modern applications. In this article I give four different approaches to communicate between bounded contexts with varying levels of isolation.

This article is part of the Evolving Modular Monoliths series, the articles are:

TL;DR – summary

  • DDD says that your application should be broken up into separate parts (DDD terms: bounded contexts or domain) and these bounded contexts should be isolated from each other so that each bounded context can focus on its particular business group.
  • DDD also describes various ways to communicate between bounded contexts based on the business needs but doesn’t talk too much about how they can be implemented.
  • These articles are about .NET monolith applications and describe various ways to implement communicate paths between two bounded contexts:
    • Example 1: exchange data via a common database
    • Example 2: exchange data via a method call
    • Example 3: exchange data using a message broker
    • Example 4: communicating from new code to a legacy application
  • At the end of example 1 there is information on how to create EF Core DbContexts for each bounded context with individual database migrations.
  • The conclusion hives the pros, cons and limitation of each communication example.

What DDD says about communicating between bounded contexts

Just as DDD’s bounded context helped us when breaking up out monolith into modules, then DDD can also help with mapping the communicated between bounded contexts. DDD defines seven ways to map data when passing data between bounded contexts (read this article for an explanation of each type).

You might have thought that the mappings between two bounded contexts should always isolated, but DDD recognises the isolation comes at the cost of more development time, and possibly slower communications. Therefore, the seven DDD mapping approaches run from tightly coupling right up to complete isolation between the two ends of the communication. DDD warns us that using a mapping design that tightly links the two bounded contexts can cause problems when you want to refactor / improve one of the bounded contexts.

NOTE: I highly recommend Eric Evans’ 30-minute talk about bounded contexts at DDD Europe 2020.

Later in this article I show a number of approaches which use various DDD mappings approaches.

The tools that .NET provides for communicating between bounded contexts

DDD provides an architectural view of communicating between bounded contexts, but .NET provides the tools to implement the mapping and communication. The facts that the application is a monolith makes the implementation very simple and fast. Here are three approaches:

  • Having two bounded contexts to map to the same data in the database.
  • Calling a method in another bounded context using dependency injection (DI).
  • Using a message broker to call a method in another bounded context.

I show examples of all three of these approaches and extracts the pros, cons and limitations of each. I also add a forth example that looks at you can introduce a modular monolith architecture into existing applications whose design is more like “a ball of mud”.

Example 1: Exchanging data via the database

In a monolith you usually have one database used by all the bounded contexts, this provides a way to exchange data between bounded contexts. In a modular monolith you want each bounded context to have its own section of the database that it works with, and you can do that with EF Core. The fact there is one database does allow you to exchange data by sharing tables/columns in the database.

An example of this approach in the BookApp application is that when the Orders bounded context gets a user’s order it only has the book’s SKU (Stock Keeping Unit), but it needs the book price, title etc. Now, the BookApp.Books part of the database has that data in its Books SQL table, so the BookApp.Orders bounded context could also map to that table too. But this tightly links the BookApp.Books and BookApp.Orders bounded context.

One way to reduce the tightly linking is to have the …Orders only map to the few columns it needs. That way the …Books could add more columns and relationships without effecting the …Orders bounded context. Another thing you can do is make the …Orders mapping to the Books table read-only by using EF Core’s ToView configuration command. That makes it completely clear that the ….Orders bounded context isn’t in change of this data. The figure below shows this setup.

Pros and cons of this approach

In terms of DDD’s mapping approaches this a shared kernel, which makes the two bounded contexts tightly linking. The fact that the …Orders bounded context only accesses a few of the columns in the Books table reduces the amount the linking of the two bounded contexts because the …Books bounded contexts could add new columns without the need to …Orders code.

The positives of this approach it’s easy to set up and it works with NuGet packages (see article 2).

EXTRA: Who to set up multiple DbContexts using EF Core

Setting up separate DbContexts for each bounded context does make using EF Core migration feature a little bit complex.  Here are the steps you need to do to use migrations:

  1. Create a DbContext just containing the classes/table in your bounded context (examples:  BookDbContext and OrderDbContext)
  2. If any of the DbContexts access the same SQL table, you need to be careful to ensure two separate migrations try to change the same table. Your options are:
    1. Have only one DbContext that maps to that SQL table and other DbContexts map to that table using EF Core’s ToView configuration command. This is the recommended way because it allows you to select only the columns you need, and you only have read-only access.
    1. Choose one DbContext to handle EF Core configuration of that SQL table and the other DbContext’s use EF Core’s ExcludeFromMigrations configuration command.
  3. Then create a IDesignTimeDbContextFactory<your DbContext> and include the MigrationsHistoryTable option to set a unique name for the migration history file (example: Orders DesignTimeContextFactory)
  4. When you register each DbContext on startup you need to again add the MigrationsHistoryTable option (example: Startup code in ASP.NET Core)

When you want to create a migration for a DbContext in a bounded context, then you need to do that from the project containing the DbContext: this these comments in the BookApp.All OrderDbContext. This approach should be used for each DbContext in a bounded context.

EXAMPLE 2: Exchanges data via a method call

One big advantage of a monolith is you can call methods, which are quick and don’t have any communication problems. But we have isolated the bounded contexts from each other, so how can we call a method in another bounded context without breaking the isolation rules? The solution is to use interfaces and dependency injection (DI). The interface provides the isolation and DI provides the correct method.

For this example of this approach let’s say that the address that the Order should be sent to is stored in the Users bounded context. To make this work, and not break the isolation we do the following:

  1. You place the following items in the BookApp.Common.Domain layer because only Common layers can have multiple bounded contexts access it (see the rules about the Common I defined in part 1 of this series)
    1. The interface IUserAddress that defines the service that the BookApp.Orders can call to obtain an Address class of a specific UserId.
    1. The Address class that the service will return.
  2. You create a class called UserAddress in the BookApp.Users bounded context that inherits the IUserAddress interface defined in step 1a. You most likely put that class in the BookApp.Users.Infrastructure layer.
  3. You arrange for UserAddress / IUserAddress pair to be registered with the DI provider.
  4. Finally, in your BookApp.Orders you obtain an instance of the UserAddress via DI and use it to get the Address you need.

The figure below shows this setup.

Pros and cons of this approach

In terms of DDD’s mapping approaches this a customer / supplier mapping approach – the customer is the BookApp.Orders and the supplier is the BookApp.Users. The interface provides good isolation for the service but sharing of the Address class does link the two bounded contexts.

From the development point of view, you have to organise your code in three places, which is bit more work. Also, this approach isn’t that good when working with bounded contexts that have been turned into NuGet packages.

Overall, this approach provides better isolation than the exchange via the database but takes more effort.

EXAMPLE 3: Exchange data using a request/reply message broker

In the last mapping implementation used the .NET DI provider, which meant any interfaces and classes used has to be in a .NET project that both bounded contexts. There is another way to automate this using a request/reply message broker. This allows you to set up mapping links between two bounded context while not breaking the isolation rules,

Let’s create the same in example 2, that is getting the address that the Order from the bounded context. Here are the steps:

  1. Register a request/reply message broker as a singleton to the DI provider.
  2. In the BookApp.Users bounded context register a get function with the message broker. You will send an Address class which is registered in the BookApp.Users bounded context.
  3. In the BookApp.Orders bounded context call an ask method in the message broker. You will receive the add into an Address class in the BookApp. Orders bounded context.

The figure below shows this setup.

The message broker allows you to register a getter function (left side of the figure) that can be called by the AskFor method (right side in figure). This is equivalent to calling a method in example 2 but doesn’t need any external interfaces or classes. Also, in this example there are two classes called Address which the message broker can map between to two Address classes, thus removing the need for an extra common layer we needed in example 2.

Initially I couldn’t find a request/reply message broker so I build a simple version you can find here, but more research RabbitMQ has a remote procedure call that does this (but my simple version is easier to understand).

NOTE: Microservice architecture normally use a publish/listen message broker, where apps register to be informed if certain data changes. This is done to improve performance by sending updated of data needed by a Microservice app so that it cache the data locally. However, in a monolith architecture you can access data anywhere in the app in nanoseconds (just an in-memory dictionary lookup and a function call), so a request/reply message broker is more efficient.

Pros and cons of this approach

This is another customer / supplier mapping approach as used in example 2, but more isolation due to the request/reply message broker being able to copy data from one type to another.

From a development point of view this is easier than example 2 which called a method using DI, because you don’t have to add the BookApp.Common.Domain layer to share the interface and class. The other advantage of this message broker approach is it works with bounded contexts turned into NuGet packages.

There aren’t any downsides other than learning how to use a request/reply message broker.

Overall, I think this approach is quick to implement, provides excellent isolation and works with bounded contexts turned into NuGet packages.

EXAMPLE 4: Added new modular code to a legacy application

There are lots of existing applications out there, some of which don’t have a good design and have fallen into “a ball of mud” – we call these legacy applications. So, the challenge is to apply a modular monolith architecture to an existing application without the “ball of mud” code “infecting” your new code.

One solution I defined uses three parts to add new code to legacy applications.

  • Build your new feature in a separate solution: This gives you have a much better chance of building your code using modern approaches such as modular monolith architecture.
  • You install your new feature via a NuGet package: By packaging your new feature into NuGet package makes it much easier to add your new code to the existing application.
  • Use DDD’s Anticorruption Layer (ACL) mapping approach: The ACL mapping approach that builds adapters between the existing application’s code and concepts and the new code you have written to add a new feature.

The figure below shows how this might work.

You already know about separate solutions and NuGet packaging from in part 2 of this series so I concentrate an the ACL and how it works.

DDD’s ACL mapping approach is designed for interfaces to a legacy system. It assumes that a) the legacy system’s code cannot be easily changed, and b) the legacy system design suboptimal design. The ACL mapping approach hides the more difficult parts of the legacy system by using the adapter pattern, which allows you to write your new code against a “cleaned up” interface.

Of all the DDD mapping patterns the ACL mapping provides a very high level of separation between the legacy system and your new code. The downside is of all the DDD mapping approach the ACL take the most development effort to create.

While I have described how to add new code to legacy system, I have to say that it isn’t a simple job. My experience is that fully understanding a legacy system’s code is far harder and takes longer that writing the ACL layer code.

The understanding and unscrambling of a legacy system is a big topic and I’m not going to cover it here, but you might like to look a few of links I have listed below:

NOTE: You might also be interested in the strangler pattern if working with existing applications. This pattern provides a way to progressively change your old code to a more modern code design.

Pros and cons of this approach

DDD’s ACL mapping approach provides excellent separation between the two parts but takes a lot of development effort to build. Therefore, you should only use this when there is no other way to achieve this. However, it’s not the building the ACL mapping that the hard part, the hard part is working out how the legacy system works so that you can add your new.

Conclusion

I have described four examples of communicating between DDD bounded contexts. From a DDD point of view I didn’t cover all of DDD’s bounded context mapping approaches in this article, but I did cover the four main ways to implement communicating between bounded contexts in .NET monolith applications.

As you have seen it’s a balance between how much the communication ties the design of two bounded contexts together against the amount of development effort it takes to write the communication link. The list below provides a summary of the pros and cons of each approach I cover in this article.

  • Exchanging data via the database (example 1)
    • Pro: fairly easy to implement
    • Con: some linking between bounded contexts (DDD shared kernel)
    • Limitations: none
  • Exchanges data via a method call (example 2)
    • Pro: good performance, easy to implement
    • Cons: Needs extra common layer to share interfaces/classes
    • Limitations: Doesn’t work with bounded context NuGet packages
  • Exchange data using a request/reply message broker (example 3)
    • Pro: good performance, easy to implement
    • Cons: You need a request/reply message broker
    • Limitations: none
  • Adding new modular code to a legacy application (example 4)
    • Pro: allows you to write new code using a modern design
    • Cons: A LOT of work
    • Limitations: none

NOTE: The “exchanging data via the database” also contains extra information (link!!!) on how to create individual EF Core DbContexts for each bounded context that has to link the database.

I hope this article, plus others in the series have been useful to you.

Happy coding!

Evolving modular monoliths: 2. Breaking up your app into multiple solutions

Last Updated: May 10, 2021 | Created: May 10, 2021

This is the second article in a series about building Microsoft .NET applications using a modular monolith architecture. This article covers a way to extract parts of your application into separate solutions which you turn into NuGet packages that are installed in your main application. Each solution is physically separated from each other in a similar way to Microservice architecture, but without the performance and communication failure modes that Microservices can have.

My view is that the Microservice architecture is great for applications that need with large development teams and/or have to handle high levels of demand, like Netflix. But for smaller applications the Microservice architecture can be overkill. However, I do like the Microservice idea of having separate solutions, because that makes the application easier to understand, refactor and manage so I have defined a way to do extract parts of an application into their own solution. The result is an application that is easier to build because it’s using the simpler monolith architecture but has the benefits of having multiple separate solutions that are combined by using NuGet.

NOTE: If you are planning to build Microservice architecture, then the approach described is also a great starting point because it already creates separate solutions. Martin Fowler also suggests that starting with a monolith approach is the best way to build a Microservice application – see his article called “Monolith First”.

This article is part of the Evolving Modular Monoliths series, the articles are:

TL;DR – summary

  • One of the best ways to structure your application is to break the business needs into what DDD calls bounded contexts (see this section in the first article for more on bounded contexts).
  • In a modular monolith some bounded contexts can be large and/or complex. In these cases, giving a large/complex bounded context its own solution makes the code easier to understand, navigate, and manage.
  • I have created a dotnet tool called MultiProjPack that automates the creation of the NuGet package for projects that follow the naming convention described in the first article.
  • I describe a fast local build/test/debug cycle when adding/changing code in a separate solution. This uses a local NuGet package source and some features in the MultiProjPack tool.
  • I describe the options for getting source code information while debugging a NuGet package inside your main application.
  • I finish with a section on building the composite application for deployment to production and suggest ways to store your private NuGet packages.

Breaking up your app into multiple solutions

The idea is to get the separation that a Microservice pattern provides while keeping the performance and reliability of direct method or database access. To do this we extract a DDD bounded context (see the section on bounded contexts in the first article) section of your code into its own solution and then turn it into NuGet package to install in your main application.

Turning a DDD bounded context into a separate solution/NuGet package requires a bit more work, so I suggest you only apply this approach to large and/or complex parts of your application. The benefits are:

  • The code is easier to understand/navigate because it’s isolated into its own solution.
  • If the development team is large, it’s easier for one team to work on an isolated solution   and “publish” a NuGet package for other teams to use.
  • On older application this approach means you can build new features using a more modern design without the structure of the old application hindering your new design.

I have created two example repos that show this approach in action I have created another version of the e-commence web app that sells books called BookApp. The main application is called BookApp.Main (see https://github.com/JonPSmith/BookApp.Main for the code example), which contains the FrontEnd and the Order processing parts (BookApp.Orders…). The part of the application deals with the querying and updating of the books in the database (referred to as BookApp.Books) is large enough to warrant turning into a solution (see https://github.com/JonPSmith/BookApp.Books for code example). The BookApp.Books solution is turned into a NuGet package and installed in the BookApp.Main. The figure below shows this in action.

This makes it much easier for a development team to work on a part of the application in a separate repo/solution. Also, the team can “publish” a new version via a private NuGet server, with fallback to the old version provided by simply changing back to the previous NuGet packages.

NOTE: One limitation around using a NuGet package is that all your projects in the solution must have the same framework, e.g. .net5.0. That’s because NuGet is designed to handle packages with work with multiple frameworks, for instance the Newtonsoft.Json package can work with seven types of frameworks (see its dependences). But in this usage you want all of your projects to be the same otherwise it won’t work.

Communication between the front-end and a NuGet package

Turning a bounded context into a NuGet package gives you the benefit of separation that a Microservice has. But while a Microservice architecture has one API front-end per Microservice, in our modular monolith design there is direct code link, via the NuGet package, to the BookApp.Books code.

This changes the communication channel we use between each bounded context and the user. In a Microservice design the front-end typically accesses a service via HTTP API, but in our modular monolith design it accesses a service via dependency injection to call method. So, in a modular monolith the API is defined mainly by interfaces and dependency injection, which a few key class definitions. And because we are using a monolith architecture there is one front-end in BookApp, that is a ASP.NET Core project.

NOTE: There are other communication paths between two bounded contexts, which I cover in part 3.

The API is mainly defined by the ServiceLayer, which contains many of the services linked to a bounded context (read this section about the ServiceLayer in one of my articles).  For instance, in my example BookApp.Main I want to display a list of books with sorting, filtering and paging. This uses a service referred to by the interface IListBooksService in the …ServiceLayer.GoodLinq project. The ServiceLayer will also contain some classes too, such as the BookListDto class to provide the data to display the books, and various other classes, interfaces, constants etc.

NOTE: For Separation of Concerns (SoC) reasons I also recommend adding a small project specifically to handle any startup code, e.g., registering services with the dependency injection provider. I also with application that loads data from a configuration file (e.g. appsettings.json) I pass the IConfiguration interface too in case the code needed that – see BookApp.Books.AppSetup as an example.  

How to create a NuGet packages

Now that we have defined how the bounded context with link to the main application, we now need to create a NuGet package. It turns out that creating a NuGet package containing multiple .NET projects is doable, but takes a lot of work building the .nuspec file properly.

To automate the creation the NuGet package I built a dotnet tool called MultiProgPack (repo found here) that builds the .nuspec file by scanning your solution for certain projects/namespaces and builds a NuGet package for you. This tool also contains features to make it much quicker to build and test your NuGet package on your development computer by using a local NuGet package server, which I describe later.

The MultiProgPack (which is a NuGet package) can be installed or updated on your computer the following three lines of command line commands:

dotnet tool install JonPSmith.MultiProjPack -global

dotnet tool update JonPSmith.MultiProjPack –global

Once you have installed the MultiProgPack you to call this dotnet tool via a command line in one of the projects in your solution (you can use any project, but I normally use the BookApp.Books.AppSetup project). Here are how you call it, with your selection of one of three of its options:

MultiProjPack <D|R|U>

The three options do the following

  • D(ebug): This creates a NuGet package using the Debug configuration of the code.
  • R(elease): This creates a NuGet package using the Release configuration of the code.
  • U(pdate): This builds a NuGet package using the Debug configuration, but also updates the .dll’s in the NuGet cache (I explain this in this section).

The tool relied on a xml file called MultiProjPack.xml in the folder you run the tool from. This file defines the NuGet data (name, version and so on) and optional tool settings. Here is a typical setup, but there are a lot more settings if you need them (see this example file containing all the settings and the READMe file for more information)

<?xml version="1.0" encoding="utf-8"?>
<allsettings xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<!-- this contains the typical information you should have in your settings -->
  <metadata>
    <id>BookApp.Books</id>
    <version>1.0.0-preview001</version>
    <authors>you must give a list of author(s)</authors>
    <description>you must provide a description of the NuGet</description>
    <releaseNotes>optional: what is changed in this release?</releaseNotes>
  </metadata>
  <toolSettings>
    <!-- This is used to find projects with names starting with this. If null, then uses NuGet id -->
    <NamespacePrefix></NamespacePrefix>
    <!-- excludes named projects (comma separated), e.g. "Test" would exclude a project starting with "BookApp.Books.Test" -->
    <ExcludeProjects>Test</ExcludeProjects>
    <!-- worth filling in with your local NuGet Package Source folder. See docs about using {USERPROFILE} in string -->
    <CopyNuGetTo>{USERPROFILE}\LocalNuGet</CopyNuGetTo> 
  </toolSettings>
</allsettings>

NOTE: Don’t worry, the tool has the command –CreateSettings that will create the MultiProjPack.xml with the typical settings for you to edit.

When you run the tool it:

  1. Scans all the folder for all .NET projects starting with the toolSettings.NamespacePrefix value e.g., BookApp.Books, and create a .nuspec file containing all the .NET projects it found
  2. It then calls dotnet pack to create the NuGet package using the .nuspec file created in step1.
  3. If you set up the correct  <toolSettings> it will update your local NuGet package server. I explain this later
  4. If you ran it in U(pdate) mode it will replace the data in the NuGet cache. I explain this later.  

But before I describe step 3 and 4 when running the MultiProgPack tool, lets consider how you might debug an application using a NuGet package.

The debugging an application that uses modular monolith NuGet package

There is a problem when developing an application using the modular monolith NuGet package approach, and that is the time it takes to upload a NuGet package to nuget.org (or any other web-based NuGet servers). That upload can take a few minutes, which doesn’t make a good development process. But be of good cheer – I have solved this problem and it takes less than a second to upload a NuGet package when testing locally. But before I describe the solution let’s look at the development process first.

For this example, you want to add a new feature, say adding a wish list where users can tell other people which books they would like for their birthday. And your application is called BookApp.Main, which contains the ASP.NET Core FrontEnd, and you decide to add the wish list feature to the code to BookApp.Books solution, which is in a separate solution and installed in the BookApp.Main application via a NuGet package.

The development process would require five things:

  1. Add new pages and commands in the FrontEnd code in your BookApp.Main application.
  2. Add new services in the part of the application that you have in separate BookApp.Books solution and write some unit tests to make sure they work.
  3. Then create a new version of the NuGet package from the BookApp.Books solution.
  4. Then you install the new version BookApp.Books NuGet package in the BookApp.Main application.
  5. Then you can try out your new feature by running the BookApp.Main locally with dummy data.

If you are a genus, you might be able to write both parts and it all works first time, but most people like me would need to go around these five steps many times. And if you had to wait for a few minutes for each NuGet package upload, it would be a very painful development cycle. Thankfully there are ways around this – the first is using local NuGet server on your development computer.

1. Using a local NuGet server to reduce the package upload time to milliseconds

It turns out you can define a folder on your development computer (Windows, Mac or Linux) as a NuGet package source. This means the upload is now just a copy of your new NuGet package into a folder on your development computer, so it’s very fast. That means your development testing will also be fast.

To add a local NuGet package source via Visual Studio you need to get to its NuGet package sources page. The command to get to that page is Tools > NuGet Package Manager > Package Manager Settings > Package Sources. Then you can add a new package source that is linked to a folder on your computer. Below is a screenshot of my NuGet package sources page where I have added a local folder as a possible NuGet package source (see selected source).

Once you have done that any NuGet packages the folder you have defined will show up in the NuGet Package Manager display if the local NuGet package source (or All) is selected. See the example below where I have selected my Local NuGet in the package source, which is highlighted in yellow.

NOTE: If you are using VSCode and dotnet commands then see this article about setting up folder on your local computer.

To make this even easier/quicker the MultiProgPack tool has a feature that will copy the newly created NuGet package directly into your local folder. To turn on that feature you need to fill in the <toolSettings>.<CopyNuGetTo> setting with your local NuGet package folder. The CopyNuGetTo setting will support the string {USERPROFILE} to get your user’s account folder, which means it works for any developer. For instance, the string {USERPROFILE}\LocalNuGet on my computer would become C:\Users\JonPSmith\LocalNuGet on my computer. That means your MultiProjPack.xml doesn’t have to change for each developer on the team.

The end result of setting the CopyNuGetTo path is that the command MultiProgPack D (or R) is a new NuGet package will appear in your local NuGet package source and you can immediately add or update your main application with that NuGet package.

2. Directly updating the NuGet package cache

The local NuGet package source is great, but you need to in increment the NuGet version for each package, as once a package is in the NuGet package cache you can’t override it. I found that a pain, so I looked for a better solution. I found one by accessing the NuGet package cache.

The NuGet package cache speeds up the performance of builds using NuGet packages. When you install a new NuGet package it adds an unpacked version of the NuGet package in the NuGet package cache. The unpacked version of the NuGet package contains all the files in the NuGet package, e.g. the code (.dll), symbol files (.pdb), documentation files (.xml), and these are copied into your application on certain actions, e.g. Build > Rebuild Solution, Restore NuGet packages, etc.

I take advantage of this feature to speed up the whole write code, install, test cycle (steps 3 to 5 of the development process already described) by directly updating the NuGet package cache at the same time, which is triggered by the MultiProjPack U(pdate). This works if you have created and installed a NuGet package with new code, but it has a bug. Instead of you having to create a new version of the NuGet package the U(pdate) command update the existing NuGet package, both in the local NuGet package source and the NuGet package cache.

The end result is, after running the MultiProjPack U(pdate) command you use Visual Studio’s command Build > Rebuild Solution, which will rebuild your main application using the files in the NuGet package cache. That cuts out two manual steps (changing the NuGet package version and updating the NuGet package in the main application), but there is a limitation.

The limitation is if you add, remove or update the .csproj file (say, by adding the NuGet packages it need), then you can’t use the U(pdate) command. For these cases you have to create a new NuGet package with a higher version number. The MultiProjPack U(pdate) command is useful for testing or fixing a bug and need to go around multiple times.

NOTE: For users that don’t like with the idea of changing the cache another way to go is to add a new NuGet version using the -m: option e.g., MultiProjPack D -m: version=1.0.0.1-preview002. This overrides the version in the MultiProjPack.xml file, thus saving you having to edit the MultiProjPack.xml every time, but then you need to manually update the NuGet package via the NuGet Package Manager display.

These two features reduce the minutes of uploading a NuGet package to nuget.org into a one command that takes seconds to update a NuGet package in your main application.

Tips on how to debug your NuGet package

As part of the build/test/debug cycle you will install or update your NuGet package into the main application to check it works. If something goes fails in your main application you might want to debug code that is in your NuGet package. Here are some tips on how to do this.

The first thing you need to do is turn off the “Just My Code” setting in the Debugger (see figure below). This allows you to step into NuGet package, and set breakpoints, see the local data and so on inside your NuGet package.

The second thing you need to know about how Visual Studio accessed the code introduced by your NuGet package. There are various ways to add the source code, names of local variables, stack traces, and so on into your NuGet package. The ways to add source code information to a NuGet for debugging: embedding the code in the .dll file or using symbols files.

1. Embedding the source code in the .dll file

You can embed the source code in the code (.dll) file, which makes it easy for Visual Studio to find the source code related to the .dll file. This works well, but requires you to add some extra xml commands to the NET project’s .csproj files.

The downside of this approach is it makes the .dll file bigger, so I suggest that you only add the source code to the .dll file when working to a Debug configuration build. To do this need to add following xml commands in a .NET project’s .csproj files you want this feature.

  <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|AnyCPU'">
    <DebugType>embedded</DebugType>
    <EmbedAllSources>true</EmbedAllSources>
  </PropertyGroup>

The next approach, using symbol files, is more automatic, but has some limitations.

2. Using the symbols files

By default, a .NET project will create a code (.dll) file and a symbols (.pdb) file in both the bin\{framework} and obj\{framework} folders when a project is build and the MultiProjPack tool will add then to the NuGet package. Visual Studio can use these when debugging, but if you change the code in a project in the NuGet package Visual Studio will stop you debugging that project. 

Visual Studio is very clever about finding debug information. If you open the Visual Studio’s Debuger > Windows > Modules window you will see all the .dll files in your application, and some of them will have a symbol status of loaded. By looking at this I can see that Visual Studio can find the folder holding the solution for the NuGet package, and it can access the symbols (.pdb) file of each project (I don’t know how Visual Studio does that, but it does).

This all sounds perfect, but as I said if you change the code in a project in your NuGet package, then Visual Studio will stop linking symbols file of the changed code. That means you can’t debug the change project anymore. The good news is the MultiProjPack U(pdate) does work, and you still debug the project via symbols.

NOTE: There are different formats for symbols files, which is a bit confusing. Also symbols files and NuGet symbol servers aren’t supported everywhere. The MultiProjPack tool will copy symbols (.pdb) files into the NuGet package if they are found in the project. It can also create a NuGet symbol package if you need it (see this setting).

What to do for production?

MultiProjPack’s D(ebug) and U(pdate) commands create NuGet packages using the Debug configuration of your code, and typically live in the local NuGet project source. But what should you do when you want to release your code for production?

Firstly, you should build a NuGet package using the release version of the code using the R(elease) command, i.e., MultiProjPack R. The created NuGet package will be in a folder called .nupkg in the folder where you ran the MultiProjPack tool. Now you need this release NuGet package so others can use it. How you do that depends on how you work and whether you want your NuGet packages to be private or not.

By default, the git ignore file will ignore NuGet packages so it you create a package you need to store your Release packages in some other form other than your Git repo. If your NuGet package can be public, then it’s easy – use https://www.nuget.org/. For holding NuGet packages privately, then here are some possibilities:

  • GitHub: GitHub allows you to publish and consume NuGet packages and they can be private, even for free accounts. See the GitHub NuGet docs and this article by Bruno Hildenbrand.
  • Azure DevOps: Azure provides a way to publish and consume NuGet packages within a DevOps pipeline. See the Azure NuGet docs and this article by Greg Margol.
  • MyGet: MyGet has been around for years and used by a host of companies, but it costs.
  • BaGet: Scott Hanselman’s article recommends BaGet, but at the time of writing it doesn’t have a private feed version.

Conclusion

In my quest to modularize a large monolith architecture I wanted to mimic the Microservice pattern approach of breaking the business problem into separate applications that communicate with each other. To do this I used NuGet packages to implement the “separate” solutions part of the Microservice architecture in a monolith application, using DDD bounded contexts to guide how to break my application into discrete parts.

It does take a big more work to use NuGet packages, but as a working developer I have strived to make this approach as automated and fast as possible. That’s why I build the MultiProjPack tool is a key part both creates the correct NuGet package and has features to make your build/test/debug cycle as quick as possible.

The first article described two ways to modularize a monolith so that its code is less likely to turn into “a ball of mud”. This article takes this another level of separation by allowing you build parts of your application as individual solutions and combine them into the main application using NuGet. The missing part is how you can communicate between bounded contexts, including bounded contexts build as NuGet packages, which I cover in the third article.

Evolving modular monoliths: 1. An architecture for .NET

Last Updated: May 17, 2021 | Created: May 3, 2021

This is the first article in a series about building a .NET application using a modular monolith architecture. The aim of this architecture is to keep the simplicity of a monolith design while providing a better structure so that as your application grows it doesn’t turn into what is known as “a big ball of mud” (see the original article which describes a big ball of mud as “haphazardly structured, sprawling, sloppy, duct-tape-and-baling-wire, spaghetti-code jungle”).

The designs I describe will create a clean separation between the main parts of your monolith application by using a modular monolith architecture and Domain-Driven Design’s (DDD) bounded context approach. These designs also provide a way to add a modular monolith section to an old application whose architecture isn’t easy to work with (see part 3).

NOTE: This article doesn’t compare a Monolith against a Microservice or Serverless architecture (read this article for a quick review of the three architectures). The general view is that a monolith architecture is the simplest to start with but can easily become “a ball of mud”. These articles are about improving the basic monolith design so that it keeps a good design for longer.

In Evolving Modular Monoliths series, the articles are:

NOTE: To support this series I create a demo ASP.NET Core e-commerce application that sells books using a modular monolith architecture. For this article the code in the https://github.com/JonPSmith/BookApp.All repo provides an example of the modularize bounded context design described in this article.

Who should read these articles?

These articles are aimed at all .NET software developers and software architects. It gives an overview of a modular monolith architecture approach, but later articles also look at team collaboration, and the nitty-gritty of checking, testing and deploying your modular monolith application.

I assume the reader is already a .NET software developer and is comfortable with writing C# code and understand .NET term, such as .NET projects (also known as assemblies), NuGet packages and so on.

What software principals do we need to avoid “a big ball of mud”?

Certain software principals and architectures can help organise the code in your application. The more complex the application, then the more you need to use good software principals to guide you to create code that is understandable, refactorable and robust. Here are some software principals that I will apply in this series:

  • Encapsulation: Isolating the various code for each business features from each other so that that a change to code for one business feature doesn’t break code for another business feature.
  • Separation of Concerns (SoC): That is, this design provides high cohesion (links only to relevant code) and low dependency (isn’t affected by other parts of the code).
  • Organisation: Makes it easy to find the various parts of code for a specific feature when you want to change things.
  • Collaboration: Allows multiple software developers to work on an application at the same time.
  • Development effort/reward: We want a architecture that balances speed of development against building an application that is easy to enhance and refactor.
  • Testability: Breaking up a feature into different parts often makes it easier to test the code.

Stages in building an ASP.NET Core application

I am going to take you three levels of modularization: modularization using an n-layered architecture, modularization at the DDD bounded context level, and finally modularization inside a DDD bounded context. Each level builds on the previous level, ending up with a high level of isolated for your code.

1. Basic n-layered application

I start with the well-known approach of breaking up an application into layers, known as an n-layered architecture. The figure below shows the way I break up an application into layers within my modular monolith design.

You might be surprised by the number of layers I used, but they are there to help to apply the SoC principal.  Later on, I define some architectural rules, enforced by unit tests, that ensures that the code to implement a feature has is high cohesion and low dependency.

To help you to understand what all these layers are for let’s look at how the creation of an order for book in my BookApp application (you try this feature by downloading BookApp.All repo and running it on your computer):

  • Front-end: The ASP.NET Core app handles showing the user’s basket and sending their order to the ServiceLayer.
  • ServiceLayer: This contains a class with a method that accepts the order data and sends it to the business layer. If everything is OK, it commits the order to the database and clears the user’s basket.
  • Infrastructure: Not needed for this feature.
  • BizLogic / BizDbAccess: An order is complex, so it has its own business logic code which is isolated from the database via code held in the BizDbAccess layer (this approach is described here).
  • Persistence: This provides the EF Core code to map the Orders classes to the database.
  • Domain: This holds the Order classes mapped to the database.

The figure above shows the seven layers as n-layer architecture which most developers are aware of. But in fact, I am using the clean architecture layering approach, which depicted as a series of rings, as shown in the figure to the right (click for a larger view). The clean architecture is also known as Hexagonal Architecture and Onion Architecture.

I use the clean architecture layering approach because I like its rules, such as inner layers can’t access outer layers. But any good n-layering approach would work.

I really like the clean architecture’s “layers can’t access outer layers” rule, but I have altered some of the clean architecture’s rules. They are:

  • I added a Persistence layer deep inside the rings to accommodate EF Core. See this section of a pervious article which explains why.
  • The clean architecture would place all the interfaces in the inner Domain layer, but I place the interfaces in the same project where the service is defined for two reasons: a) to me keeping the interface with the service is a better SoC design and, b) it also means an inner layer can’t call outer service via DI because they can’t access the interface.

Two general layering approaches that still apply when we move to a modular monolith architecture. They are:

  • I place as little as possible business code in the Front-end layer, but instead the Front-end calls services registered with the dependency injection system to display or input data. I do this because a) the Front-end should only manage the output / input of data, and b) it’s easier to test these services as a method call rather than testing ASP.NET Core APIs or Pages, which is much harder and slower.
  • The ServiceLayer has a very important role in my applications, as it acts an adapter between the business classes and the user display/input classes. See the section called “the importance of the Service Layer in the Clean Architecture layers” in a previous article to find out why.

The downsides of only using an n-layered architecture

I have used the n-layered architecture for many years, and it works, but the problem is that the n-layered architecture only applies the SoC principal at layer level, but not within layer. This has two bad effects on the structure of your application.

  • Layers can very big, especially the ServiceLayer, and hard to find and change anything.
  • When you do find the code for a business feature it’s not obvious whether other classes link you this code.

The question is, is there an architecture that would help you to follow the SoC encapsulation principals? Once I tried a modular monolith architecture (with DDD and clean architecture) I found the whole experience was significantly better than the n-layered architecture on its own. That’s because I knew where code for a certain business feature was by the name of the .NET project, and I knew that those projects only held code that is relevant to the business feature I am looking at. See a previous article called “My experience of using modular monolith and DDD architectures” where I reflect on using modular monolith on a project that was running late.

2. Modularize your code using DDD’s bounded context approach

So, rather than having all your code in an n-layered architecture we want to isolate the code for each feature. One way is to break up your application into specific business groups, and DDD provides an approach called bounded contexts (see this article by Martin Fowler for an overview of bounded contexts).

Bounded contexts are found by looking at the business needs in your application but Identifying bounded context can be hard (see this the video The Art of Discovering Bounded Contexts by Nick Tune for some tips). Personally, I define some bounded contexts early on, but I am happy to change the bounded context’s boundary walls and names as the project progresses and I gain more understand of the business rules.

NOTE: I use the name bounded context throughout this series, but there lots of different names around DDD’s bounded context concept, such as domain, subdomain, core domain etc. In many places I should use the DDD term domain, but that clashed with the clean architecture’s usage of the term domain, so I use the term bounded context wherever DDD bounded contexts or DDD domains are used. If you want some more information on all of the DDD terms around the bounded context try this short article by Nick Tune.

DDD’s bounded context work at the large scale in your business, for example in my BookApp as well as displaying books the user could also order books. From a DDD point of view the handling of books and handling of user’s orders are in different bounded contexts: BookApp.Books and BookApp.Orders – see the figure below.

Each layer in each bounded context has a .NET project containing the code for each layer, with the BookApp.Books .NET projects separate from the BookApp.Orders NET projects. So, in the figure the .NET projects in the Books bounded context is completely separate from the .NET projects in the Orders bounded context, which means each bounded context is isolated from each other.

NOTE: Another way to keep the bounded context isolated is to build a separate EF Core DbContext for each bounded context with only the tables that the bounded context needs to access. I cover how to do this in part 3 of this series.

Each layer is a .NET project/namespace and must have a unique name, and we want a naming convention that makes it easy for the developer to find the code they want to work on: the figure below shows the naming convention I that best described the application’s parts.

Bounded contexts also need to share data between bounded contexts, but in a way that doesn’t compromise the isolation por design of the bounded context. There are known design patterns of sharing data between bounded contexts, and I cover these in part 3.

NOTE: I recommend Kamil Grzybek’s excellent series on the Modular Monolith. He uses the same approach as I have described in this section. Kamil’s articles give a more details on the architectural design behind this design while my series introduces some extra ways to modularize and share your code.

3. Modularize inside a bounded context

The problem of just modularizing at the bounded context level is that many bounded contexts contain a lot of code. I can think of client projects I have worked where a single bounded context contained at over a years’ worth of developer effort.  That could mean that a single bounded context could become “a big ball of mud” all by itself. For this reason, I have developed a way to modularize within a single bounded context.

NOTE: There is a fully working BookApp build using a modularizing bounded context approach at https://github.com/JonPSmith/BookApp.All. It contains 23 projects and provides a small but complex application as an example of how the modularizing bounded context approach could be applied to a .NET application.

Modularizing at the bounded context level is done via the high-level business design of your application, while the modularization inside a bounded context is done by grouping all the code for a specific feature and give it its own .NET project(s). Taking an example from my Book App used in my book I created lots of different ways to query the database, so I had one .NET project for each query type in the ServiceLayer. These then linked down to the lower layers as shown in the figure below, although I don’t show all the references (for instance nearly every outer layer links to the Domain layer) as it would make the figure hard to understand.

NOTE: As you can see there are lots of .NET projects in the ServiceLayer, a few in the Infrastructure, none in the BizLogic/DbAccess layers and often only one .NET project in the Persistence and Domain layers. Typically, the ServiceLayer has the most projects with the Persistence and Domain layers only containing one project.

Building a modularized bounded context looks like a lot of work, but for me it was very natural, and positive, change from what I did when using an n-layer architecture. Previously, when using an n-layer architecture, I grouped my code into folders, but with modular monolith approach the previous classes etc in each folder are placed in a .NET project instead.

These .NET projects/namespaces and must have a unique name so I extend the naming convention I showed in the previous bounded context modularization section by adding an extra name on the end of each .NET project/namespace name when needed, as shown below.

The rules for how the .NET projects in a modularized bounded context are pretty simple, but powerful:

  • A .NET project can only reference other .NET projects within its bounded context (see part 3 for how data can exchanged between bounded contexts).
  • A .NET project in an outer layer can only reference .NET projects in the inner layers.
  • A .NET project can access a .NET project in the same layer, but only if its name contains the word “Common”. This allows your code to be DRY (i.e., no duplicate code) but it’s very clear that any .NET project containing “Common” in its name effects multiple features.

NOTE: To ensure these are adhered to I wrote some unit test code that will check your application follows these three rules – see this unit test class in the BookApp.All repo.

The positive effects of using this modularization approach and its rules are:

  • The code is isolated from the other feature code (using a folder didn’t do that).
  • I can find the code more quickly via the .NET project’s name.
  • I can create unit test to check that my code is following the modularization rules.

Overall, this modularization approach stops the spaghetti-code jungle part of the “a big ball of mud” because now the relationships are managed by .NET and you can’t get around them easily. In the end, it’s up to the developer to apply the SoC and encapsulation software principals but following this modularization style will help you to write code that is easy to understand and easy to refactor.

Thinking about the downside – what happens with large applications?

I learnt a lot of things about building .NET application using my modular monolith modularization approach, but it was very small compared to the applications I work on for clients. So, I need to consider the downsides of building a large application to ensure this approach will scale, because a really big application could have 1,000 .NET projects. 

The first issue to consider is, can the development tools handle an application with say 1,000 .NET projects? The recent announcement of the 64-bit Visual Studio 2022, which can handle 1,600 projects, says this won’t be a problem. And even Visual Studio 2019 can handle 1,000 .NET projects (according to a report @ErikEJ found), but another person on Twitter said that 300 .NET projects too much. However, an application with lots of .NET projects could be tiresome to navigate through.

The second issue to consider is, can multiple teams of developers work together on a large application? In my view the bounded context approach is the key to allowing multiple teams to work together, as different teams can work on different bounded contexts. Of course, the teams need to follow the DDD bounded context rules, especially the rules about how bounded context communicate with each other, which I cover in part 3 of this series.

The final issue to consider is, how could the modular monolith modularization be applied to an existing application? There are many existing monolith applications, and it would be great if you could add new features using the modular monolith modularization approach. I talk about this in more detail in part 3, but I do see a way to make that work.

An answer to these downsides – break the application into separate packages

While the 3 downsides could be handled through rules and good team communication a modular monolith doesn’t have the level of separation that separate solutions (i.e., like Microservice do), but how can we do this when we are dealing with a monolith? My answer is to move any of the larger or complex bounded contexts into its own solution, pack each solution into a NuGet package and then install these NuGet packages into the main application.

This physically separates one or more of your bounded contexts from main application code while keeping the benefits of the monolith’s quick method/data transfer. Turning a bounded context into a separate solution allows a team to work on a bounded context on its own with easier navigation and no clashing with other team’s changes. And for existing applications you can create new features in a separate solution using the modular monolith approach and add these new features via NuGet packages to your existing application.

In part 2 I describe how you can turn a bounded context into separate solution and turn it into a NuGet package that can be installed in the main application, with special focus on the development cycle to make it only take a few seconds (not the few minutes that nuget.org takes) to create, upload, and install a NuGet package for local testing.

Conclusion

I have introduced you to the modular monolith architecture and then provided two approaches to applying a modular monolith architecture and DDD principals to .NET applications. One modularized at the DDD’s bounded context level and the second level added extra modularization inside a bounded context.

The question is: will the extra work needed to apply a modular monolith architecture to your application really create an application that is easier to extend over time? My first use of a modular monolith architecture was while writing the Book, “Entity Framework Core in Action” and it was very positive. Overall, I think it made me slightly faster than using an n-layered architecture because it was easier to find things. But the real benefit was when I added features for performance tuning and added CQRS architecture, which required a lot of refactoring and moving of code.  

NOTE: I recommend you read the sections “Modular Monolith – what was bad?” and “Modular Monolith – how did it fair under time pressure?” for my review of my first use of a modular monolith architecture.

Since my first use of a modular monolith architecture, I have further refined my modular monolith design to handle large application development. In the second article I add a further level of separation for development team so that large parts of your application can be worked on in its own solution. As a software developer myself I ensured that the development process is quick and reliable, as it’s quite possible I will use this approach on a client’s application.

Please do leave comments on this article. Happy to discuss the best ways to implement a modular monolith architecture or hear of any experience people have of using a modular monolith architecture.

Happy coding!

Five levels of performance tuning for an EF Core query

Last Updated: March 4, 2021 | Created: February 23, 2021

This is a companion article to the EF Core Community Standup called “Performance tuning an EF Core app” where I apply a series of performance enhancements to a demo ASP.NET Core e-commerce book selling site called the Book App. I start with 700 books, then 100,000 books and finally ½ million books.

This article, plus the EF Core Community Standup video, pulls information from chapters 14 to 16 from my book “Entity Framework Core in Action, 2nd edition” and uses code in the associated GitHub repo https://github.com/JonPSmith/EfCoreinAction-SecondEdition.

NOTE: You can download the code and run the application described in this article/video via the https://github.com/JonPSmith/EfCoreinAction-SecondEdition GitHub repo. Select the Part3 branch and run the project called BookApp.UI. The home page of the Book App has information on how to change the Book App’s settings for chapter 15 (four SQL versions) and chapter 16 (Cosmos DB).

Other articles that are relevant to the performance tuning shown in this article

TL;DR – summary

  • The demo e-commerce book selling site displays books with various sort, filter and paging that you might expect to need. One of the hardest of the queries is to sort the book by their average votes (think Amazon’s star ratings).
  • At 700 books a well-designed LINQ query is all you need.
  • At 100,000 books (and ½ million reviews) LINQ on its own isn’t good enough. I add three new ways to handle the book display, each one improving performance, but also takes more development effort.
  • At ½ million books (and 2.7 million reviews) SQL on its own has some serious problems, so I swap to a Command Query Responsibility Segregation (CQRS) architecture, with the read-side using a Cosmos DB database (Cosmos DB is a NOSQL database)
  • The use of Cosmos DB with EF Core highlights
    • How Cosmos DB is different from a relational (SQL) database
    • The limitations in EF Core’s Cosmos DB database provider
  • At the end I give my view of performance gain against development time.

The Book App and its features

The Book App is a demo e-commerce site that sells books. In my book “Entity Framework Core in Action, 2nd edition” I use this Book App as an example of using various EF Core features. It starts out with about 50 books in it, but in Part3 of the book I spend three chapters on performance tuning and take the number of books up to 100,000 book and then to ½ million books. Here is a screenshot of the Book App running in “Chapter 15” mode, where it shows four different modes of querying a SQL Server database.

The Book App query which I improve has the following Sort, Filter, Page features

  • Sort: Price, Publication Date, Average votes, and primary key (default)
  • Filter: By Votes (1+, 2+, 3+, 4+), By year published, By tag, (defaults to no filter)
  • Paging: Num books shown (default 100) and page num

Note: that a book can be soft deleted, which means there is always an extra filter on the books shown.

The book part of the database (the part of the database that handles orders isn’t shown) looks like this.

First level of performance tuning – Good LINQ

One way to load a Book with its relationships is by using Includes (see code below)

var books = context.Books
    .Include(book => book.AuthorsLink
        .OrderBy(bookAuthor => bookAuthor.Order)) 
            .ThenInclude(bookAuthor => bookAuthor.Author)
    .Include(book => book.Reviews)
    .Include(book => book.Tags)
    .ToList();

By that isn’t the best way to load books if you want good performance. That’s because a) you are loading a lot of data that you don’t need and b) you would need to do sorting and filter in software, which is slow. So here are my five rules for building fast, read-only queries.

  1. Don’t load data you don’t need, e.g.  Use Select method pick out what is needed.
    See lines 18 to 24 of my MapBookToDto class.
  2. Don’t Include relationships but pick out what you need from the relationships.
    See lines 25 to 30 of my MapBookToDto class.
  3. If possible, move calculations into the database.
    See lines 13 to 34 of my MapBookToDto class.
  4. Add SQL indexes to any property you sort or filter on.
    See the configuration of the Book entity.
  5. Add AsNoTracking method to your query (or don’t load any entity classes).
    See line 29 in ListBookService class

NOTE: Rule 3 is the hardest to get right. Just remember that some SQL commands, like Average (SQL AVE) can return null if there are no entries, which needs a cast to a nullable type to make it work.

So, combining the Select, Sort, Filter and paging my code looks like this.

public async Task<IQueryable<BookListDto>> SortFilterPageAsync
    (SortFilterPageOptions options)
{
    var booksQuery = _context.Books 
        .AsNoTracking() 
        .MapBookToDto() 
        .OrderBooksBy(options.OrderByOptions) 
        .FilterBooksBy(options.FilterBy, options.FilterValue); 

    await options.SetupRestOfDtoAsync(booksQuery); 

    return booksQuery.Page(options.PageNum - 1, 
        options.PageSize); 
}

Using these rules will start you off with a good LINQ query, which is a great starting point. The next sections are what to do if that doesn’t’ give you the performance you want.

When the five rules aren’t enough

The query above is going to work well when there aren’t many books, but in chapter 15 I create a database containing 100,000 books with 540,000 reviews. At this point the “five rules” version has some performance problems and I create three new approaches, each of which a) improves performance and b) take development effort. Here is a list of the four approaches, with the Good LINQ version as our base performance version.

  1. Good LINQ: This uses the “five rules” approach. We compare all the other version to this query.
  2. SQL (+UDFs): This combines LINQ with SQL UDFs (user-defined functions) to move concatenations of Author’s Names and Tags into the database.
  3. SQL (Dapper): This creates the required SQL commands and then uses the Micro-ORM Dapper to execute that SQL to read the data.
  4. SQL (+caching): This pre-calculates some of the costly query parts, like the averages of the Review’s NumStars (referred to as votes).

In the video I describe how I build each of these queries and the performance for the hardest query, this is sort by review votes.

NOTE: The SQL (+caching) version is very complex, and I skipped over how I built it, but I have an article called “A technique for building high-performance databases with EF Core” which describes how I did this. Also, chapter 15 on my book “Entity Framework Core in Action, 2nd edition” covers this too.

Here is a chart in the I showed in the video which provides performances timings for three queries from the hardest (sort by votes) down to a simple query (sort by date).

The other chart I showed was a breakdown of the parts of the simple query, sort by date. I wanted to show this to point out that Dapper (which is a micro-ORM) is only significantly faster than EF Core if you have better SQL then EF Core produces.

Once you have a performance problem just taking a few milliseconds off isn’t going to be enough – typically you need cut its time by at least 33% and often more. Therefore, using Dapper to shave off a few milliseconds over EF Core isn’t worth the development time. So, my advice is and study the SQL that EF Core creates and if you know away to improve the SQL, then Dapper is a good solution.

Going bigger – how to handle ½ million or more books

In chapter 16 I build what is called a Command Query Responsibility Segregation (CQRS) architecture. The CQRS architecture acknowledges that the read side of an application is different from the write side. Reads are often complicated, drawing in data from multiple places, whereas in many applications (but not all) the write side can be simpler, and less onerous. This is true in the Book App.

To build my CQRS system I decided to make the read-side live in a different database to the write-side of the CQRS architecture, which allowed me to use a Cosmos DB for my read-side database. I did this because Cosmos DB designed for performance (speed of queries) and scalability (how many requests it can handle). The figure below shows this two-database CQRS system.

The key point is the data saved in the Cosmos DB has as many of the calculations as possible pre-calculated, rather like the SQL (+cached) version – that’s what the projection stage does when a Book or its associated relationships are updated.

If you want to find out how to build a two-database CQRS code using Cosmos DB then my article Building a robust CQRS database with EF Core and Cosmos DB describes one way, while chapter 16 on my book provides another way using events.

Limitations using Cosmos DB with EF Core

It was very interesting to work with Cosmos DB with EF Core as there were two parts to deal with

  • Cosmos DB is a NoSQL database and works differently to a SQL database (read this Microsoft article for one view)
  • The EF Core 5 Cosmos DB database provider has many limitations.

I had already look at these two parts back in 2019 and written an article, which I have updated to EF Core 5, and renamed it to “An in-depth study of Cosmos DB and the EF Core 3 to 5 database provider”.

Some of the issues I encountered, listed with the issues that made the biggest change to my Book App are:

  • EF Core 5 limitation: Counting the number of books in Cosmos DB is SLOW!
  • EF Core 5 limitation: EF Core 5 cannot do subqueries on a Cosmos DB database.
  • EF Core 5 limitation: No relationships or joins.
  • Cosmos difference: Complex queries might need breaking up
  • EF Core 5 limitation: Many database functions not implemented.
  • Cosmos difference: Complex queries might need breaking up.
  • Cosmos difference: Skip is slow and expensive.
  • Cosmos difference: By default, all properties are indexed.

I’m not going to go though all of these – the “An in-depth study of Cosmos DB and the EF Core 3 to 5 database provider” covers most of these.

Because of the EF Core limitation on counting books, I changed the way that that paging works. Instead of you picking what page you want you have a Next/Prev approach, like Amazon uses (see figure after list of query approaches). And to allow a balanced performance comparison with the SQL version and the Cosmos DB version I added the best two SQL approaches, but turned of counting too (SQL is slow on that).

It also turns out that Cosmos DB can count very fast so I built another way to query Cosmos DB using its NET (pseudo) SQL API. With this the Book App had four query approaches.

  1. Cosmos (EF): This accesses the Cosmos DB database using EF Core (with some parts using the SQL database where EF Core didn’t have the features to implement parts of the query.
  2. Cosmos (Direct): This uses Cosmos DB’s NET SQL API and I wrote raw commands – bit like using Dapper for SQL.
  3. SQL (+cacheNC): This uses the SQL cache approach using the 100,000 books version, but with counting turned off to compare with Cosmos (EF).
  4. SQL (DapperNC): This uses Dapper, which has the best SQL performance, but with counting turned off to compare with Cosmos (EF).

The following figure shows the Book App in CQRS/Cosmos DB mode with the four query approaches, and the Prev/Next paging approach.

Performance if the CQRS/Cosmos DB version

To test the performance, I used an Azure SQL Server and Cosmos DB service from a local Azure site in London. To compare the SQL performance and the Cosmos DB performance I used databases with a similar cost (and low enough it didn’t cost me too much!). The table below shows what I used.

Database typeAzure service namePerformance unitsPrice/month
Azure SQL ServerStandard20 DTUs$37
Cosmos DBPay-as-you-gomanual scale, 800 RUs$47

I did performance tests on the Cosmos DB queries while I was adding books to the database to see if the size of the database effected performance. Its hard to get a good test of this as there is quite a bit of variation in the timings.

The chart below compares EF Core calling Cosmos DB, referred to as Cosmos (EF), against using direct Cosmos DB commands via its NET SQL API – referred to as Cosmos (Direct).

This chart (and other timing I took) tells me two things:

  • The increase in the number in the database doesn’t make much effect on the performance (the Cosmos (Direct) 250,000 is well within the variation)
  • Counting the Books costs ~25 ms, which is much better than the SQL count, which added about ~150 ms.

The important performance test was to look at Cosmos DB against the best of our SQL accesses. I picked a cross-section of sorting and filtering queries and run them on all four query approaches – see the chart below.

From the timings in the figure about here some conclusions.

  1. Even the best SQL version, SQL (DapperNC), doesn’t work in this application because any sort or filter on the Reviews took so long that the connection timed out at 30 seconds.
  2. The SQL (+cacheNC) version was at parity or better with Cosmos DB (EF) on the first two queries, but as the query got more complex it fell behind in performance.
  3. The Cosmos DB (direct), with its book count, was ~25% slower than the Cosmos DB (EF) with no count but is twice as fast as the SQL count versions.

Of course, there are some downsides of the CQRS/Cosmos DB approach.

  • The add and update of a book to the Cosmos DB takes a bit longer: this is because the CQRS requires four database accesses (two to update the SQL database and two to update the Cosmos database) – that adds up to about 110 ms, which is more than double the time a single SQL database would take. There are ways around this (see this part of my article about CQRS/Cosmos DB) but it takes more work.
  • Cosmos DB takes longer and costs more if you skip items in its database. This shouldn’t be a problem with the Book App as many people would give up after a few pages, but if your application needs deep skipping through data, then Cosmos DB is not a good fit.

Even with the downsides I still think CQRS/Cosmos DB is a good solution, especially when I add in the fact that implementing this CQRS was easier and quicker than building the original SQL (+cache) version. Also, the Cosmos concurrency handling is easier than the SQL (+cache) version.

NOTE: What I didn’t test is Cosmos DB’s scalability or the ability to have multiple versions of the Cosmos DB around the work. Mainly because it’s hard to do and it costs (more) money.

Performance against development effort

In the end it’s a trade-off of a) performance gain and b) development time. I have tried to summarise this in the following table, giving a number from 1 to 9 for difficultly (Diff? in table) and performance (Perf? In the table).

The other thing to consider is how much more complexity does your performance tuning add to your application. Badly implemented performance tuning can make an application harder to enhance and extend. That is one reason why use like the event approach I used on the SQL (+cache) and CQRS / Cosmos DB approaches because it makes the least changes to the existing code.

Conclusion

As a freelance developer/architect I have had to performance tune many queries, and sometimes writes, on real applications. That’s not because EF Core is bad at performance, but because real-world application has a lot of data and lots of relationships (often hierarchical) and it takes some extra work to get the performance the client needs.

I have already used a variation of the SQL (+cache) on a client’s app to improve the performance of their “has the warehouse got all the parts for this job?”. And I wish Cosmos DB was around when I built a multi-tenant service that needed to cover the whole of the USA.

Hopefully something in this article and video will be useful if (when!) you need performance tune your application.

NOTE: You might like to look at the article “My experience of using modular monolith and DDD architectures” and its companion article to look at the architectural approaches I used on the Part3 Book App. I found the Modular Monolith architectural approach really nice.

I am a freelance developer who wrote the book “Entity Framework Core in Action“. If you need help performance tuning an EF Core application I am available for work. If you want hire me please contact me to discuss your needs.

My experience of using the Clean Architecture with a Modular Monolith

Last Updated: March 18, 2021 | Created: February 11, 2021

In this article I look at my use of a clean architecture with the modular monolith architecture covered in the first article. Like the first article this isn’t primer on Clean Architecture and modular monolith but is more about how I adapted the Clean Architecture to provide the vertical separation of the features in the modular monolith application.

  1. My experience of using modular monolith and DDD architectures.
  2. My experience of using the Clean Architecture with a Modular Monolith (this article).

Like the first article I’m going to give you my impression of the good and bad parts of the Clean Architecture, plus a look at whether the time pressure of the project (which was about 5 weeks later) made me “break” any rules.

TL;DR – summary

  • The Clean Architecture is like the traditional layered architecture, but with a series of rules that improve the layering.
  • I build an application using ASP.NET Core and EF Core using the Clean Architecture with the modular monolith approach. After this application was finished, I analysed how each approach had worked under time pressure.
  • I had used the Clean Architecture once before on a client’s project, but not with the modular monolith approach.
  • While the modular monolith approach had the biggest effect on the application’s structure without the Clean Architecture layers the code would not be so good.
  • I give you my views of the good, bad and possible “cracks under time pressure” for the Clean Architecture.
  • Overall I think the Clean Architecture adds some useful rules to the traditional layered architecture, but I had to break one of those rules you make it work with EF Core.

A summary of the Clean Architecture

NOTE: I don’t describe the modular monolith in this article because I did that in the first article. Here is a link to the modular monolith intro in the first article.

The Clean Architecture approach (also called the Hexagonal Architecture and Onion Architecture) is an development of the traditional “N-Layer” architecture (shortened to layered architecture). The Clean Architectureapproach talks about “onion layers” wrapped around each other and has the following main rules:

  1. The business classes (typically the classes mapped to a database) are in the inner-most layer of the “onion”.
  2. The inner-most layer of the onion should not have no significant external code e.g., NuGet packages, added to it. This is designed to keep the business logic as clean and simple as possible.
  3. Only the outer layer can access anything outside of the code. That means:
    1. The code that users access, e.g. ASP.NET Core, is in the outer layer
    1. Any external services, like the database, email sending etc. is in the outer layer.
  4. Code in inner layers can’t reference any outer layers.

The combination of rules 3 and 4 could cause lots of issues as lower layers will need to access external services. This is handled by adding interfaces to the inner-most layer of the onion and registering the external services using dependency injection (DI).

The figure below shows how I applied the Clean Architecture to my application, with is an e-commerce web site selling book, called the Book App.

NOTE: I detail the modification that I make to Clean Architecture approach around the persistence (database) layer later in the article.

Links to more detailed information on Clean Architecture (unmodified)

Setting the scene – the application and the time pressure

In 2020 I was updating my book “Entity Framework Core in Action” I build an ASP.NET Core application that sells books called Book App. In the early chapters is very simple, as I am describing the basics of EF Core, but in the last section I build a much more complex Book App that I progressively performance tuned, starting with 700 books, then 100,000 books and finally ½ million books. For the Book App to perform well it through three significant enhancement stages.  Here is example of Book App features and display with four different ways to display the books to compare their performance.

At the same time, I was falling behind on getting the book finished. I had planned to finish all the chapters by the end of November 2020 when EF Core 5 was due out. But I only started the enhanced Book App in August 2020 so with 6 chapters to write I was NOT going to finish the book in November. So, the push was on to get things done! (In the end I finished writing the book just before Christmas 2020).

My experience of using Clean Architecture with a Modular Monolith

I had used a simpler Clean Architecture on a client’s project I worked on, so I had some ideas of what I would do. Clean Architecture was useful, but its just another layered architecture with more rules and I had to break one of its key rules to make it work with EF Core. Overall I think I would use my modified Clean Architecture again in a larger application.

A. What was good about Clean Architecture?

To explain how Clean Architecture helps we need to talk about the main architecture – the modular monolith goals. The modular monolith focuses on features (Kamil Grzybek called them modules). One way to work would have one project per feature, but that has some problems.

  • The project would be more complex, as it has everything inside it.
  • You could end up with duplicating some code.

The Separation of Concerns (SoC) principal says breaking up a feature parts that focus on one part of the feature is a better way to go. So, the combination of modular monolith and using layers provides a better solution. The figure below shows two modular monolith features running vertically, and the five Clean Architecture layers running horizontally. The figure has a lot in it, but it’s there to show:

  • Reduce complexity: A feature can be split up into projects spread across the Clean Architecture layers, thus making the feature easier to understand, test and refactor.
  • Removing duplication: Breaking up the features into layer stops duplication – feature 1 and 2 share the Domain and Persistence layers.

The importance of the Service Layer in the Clean Architecture layers

Many years ago, I was introduced to the concept of the Service Layer. There are many definitions of the Service Layer (try this definition), but for me it’s a layer that knows about the lower / inner layer data structures and knows about the front-end data structures and it can adapt between the two structures (see LHS of the diagram above). So, the Service Layer isolates lower layers from having to know how the front-end works.

For me a Service Layer is a very important level.

  • It holds all the business logic or database accessed that the front-end needs, normally providing as services. This makes it much easier to unit test these services.
  • It takes on the job of adapting data to / from the front end. This means this layer that has to care about the two different data structures.

NOTE: Some of my libraries, like EfCore.GenericServices and EfCore.GenericBizRunner are designed to work as a Service Layer type service i.e., both libraries adapts between the lower / inner layer data structures to the front-end data structures.

Thus, the infrastructure layer, which is just below the Service Layer, contains for services that are still working in the entity class view. In the Book App these projects contained code to seed the database, handle logging and providing event handling. While services in the Service Layer worked with both lower / inner layer data structures and front-end data structures.

To end the “good” part of Clean Architecture I should say that a layered architecture could also provide the layering that the Clean Architecture defined. It’s just that the v has some more rules, most of which are useful.

B. Clean Architecture – what was bad?

The main problem was fitting the EF Core DbContext into the Clean Architecture. Clean Architecture says that the database should be on the outer ring, with interfaces for the access. The problem is there is no simple interface that you can use for the application’s DbContext. Even if you using a repository pattern (which I don’t, and here is why) then you have a problem that the application’s DbContext has to be define deep in the onion.

My solution was to put the EF Core right after to the inner circle (name Domain) holding the entity classes – I called that layer persistence, as that’s what DDD calls it. That breaks one of the key rules of the Clean Architecture, but other than that it works fine. But other external services, such as an external email service, I would follow the Clean Architecture rules and add an interface in the inner (Domain) circle and register the service using DI.

Clean Architecture – how did it fair under time pressure?

Appling the Clean Architecture and Modular Monolith architectures together took a little more time to think thought (I covered this in this section of the first article), but the end result was very good (I explain that in this section of the first article). The Clean Architecture layers broke a modular monolith feature into different parts, thus making the feature easier to understand and removing duplicate code.

The one small part of the clean architecture approach I didn’t like, but I stuck to, is that the Domain layer shouldn’t have any significant external packages, for instance a NuGet library, added to it. Overall, I applaud this rule as it keeps the Domain entities clean, but it did mean I had to do more work when configuring the EF Core code, e.g. I couldn’t use EF Core’s [Owned] attribute on entity classes. In a larger application I might break that rule.

So, I didn’t break any Clean Architecture rules because of the time pressure. The only rules I changed were make it work with EF Core, but I might break the “Domain layer and no significant external packages” in the future.

Conclusion

I don’t think the Clean Architecture approach has as big of effect on the structure that the modular monolith did (read the first article), but Clean Architecture certainly added to the structure by breaking modular monolith features into smaller, focused projects. The combination of the two approaches gave a really good structure.

My question is: does the Clean Architecture provide good improvements over a traditional layered architecture, especially as I had to break one of its key rules to work with EF Core? My answer is that using the Clean Architecture approach has made me a bit more aware of how I organise my layers, for instance I now have an infrastructure layer that I didn’t have before, and I appreciate that.

Please feel free to comment on what I have written about. I’m sure there are lots of people who have more experience with the Clean Architecture than me, so you can give your experience too.

Happy coding.

My experience of using modular monolith and DDD architectures

Last Updated: May 10, 2021 | Created: February 8, 2021

This article is about my experience of using a Modular Monolith architecture and Domain-Driven Design (DDD) approach on a small, but complex application using ASP.NET Core and EF Core. This isn’t a primer on modular monolith or DDD (but there is a good summary of each with links) but gives my views of the good and bad aspects of each approach.

  1. My experience of using modular monolith and DDD architectures (this article).
  2. My experience of using the Clean Architecture with a Modular Monolith.

I’m also interested in architecture approaches that help me to build applications with a good structure, even when I’m under some time pressure to finish – I want to have rules and patterns than help me to “do the right thing” even when I am under pressure. I’m about to explore this aspect because the I was privileged (?!) to be working on a project that was late😊.

UPDATE – new series

I having finished my book I have spent some time improving the initial modular monolith design described on this article. Please see the new evolving modular monoliths series.

TL;DR – summary

  • The Modular Monolith architecture breaks up the code into independent modules (C# projects) for each of the features needed in your application. Each module only link to other modules that specifically provides services it needs.
  • Domain-Driven Design (DDD) is a holistic approach to understanding, designing and building software applications.
  • I build an application using ASP.NET Core and EF Core using both of these architectural approaches. After this application was finished, I analysed how each approach had worked under time pressure.
  • It was the first time I had used the Modular Monolith architecture, but it came out with flying colours. The code structure consists of 22 projects, and each (other than the ASP.NET Core front-end) are focused on one specific part of the application’s features.
  • I have used DDD a lot and, as I expected, it worked really well in the application. The classes (called entities by DDD) have meaningful named methods to update the data, so it is easy to understand.
  • I also give you my views of the good, bad and possible “cracks” that each approach has.

The architectural approaches covered in this article

At the start of building my application I knew the application would go through three major changes. I therefore wanted architectural approaches that makes it easier to enhance the application. Here are the main approaches I will be talking about:

  1. Modular Monolith: building independent modules for each feature.
  2. DDD: Better separation and control of business logic

NOTE: I used a number of other architectures and patterns that I am not going to describe. They are layered architecture, domain events / integration events, and CQRS database pattern to name but a few.

1. Modular Monolith

A Modular Monolith is an approach where you build and deploy a single application (that’s the “Monolith” part), but you build application in a way that breaks up the code into independent modules for each of the features needed in your application. This approach reduces the dependencies of a module in such as way that you can enhance/change a module without it effecting other modules. The figure below shows you 9 projects that implement two different features in the system. Notice they have only one common (small) project.

The benefits of a Modular Monolith approach over a normal Monolith are:

  • Reduced complexity: Each module only links to code that it specifically needs.
  • Easier to refactor: Changing a module has less or no effect on other modules.
  • Better for teams: Easier for developers to work on different parts of the code.

Links to more detailed information on modular monoliths

NOTE: An alternative to a monolith is to go to a Microservice architecture. Microservice architecture allow lots of developers to work in parallel, but there are lots of issues around communications between each Microservice – read this Microsoft document comparing a Monolith vs. Microservice architecture. The good news is a Modular Monolith architecture is much easier to convert to a Microservice architecture because it is already modular.

2. Domain-Driven Design (DDD)

DDD is holistic approach to understanding, designing and building software application. It comes from the book Domain-Driven Design by Eric Evans. Because DDD covers so many areas I’m only going to talk about the DDD-styled classes mapped to the database (referred to as entity classes in this article) and a quick coverage of bounded contexts.

DDD says your entity classes must control of the data that it contains; therefore, all the properties are read-only with constructors / methods used to create or change the data in an entity class. That way the entity class’s constructors / methods can ensure the create / update follows the business rules for this entity.

NOTE: The above paragraph is super-simplification of what DDD says about entity classes and there is so much more to say. If you are new to DDD then google “DDD entity” and your software language, e.g., C#.

Another DDD term is bounded contexts which is about separating your application into separate parts with very controlled interaction between different bounded context. For example, my application is an e-commerce web site selling book, and I could see that the display of books was different to the customer ordering some books. There is shared data (like the product number and the price the book is sold for) but other than that they are separate.

The figure below shows how I separated the displaying of book and the ordering of books at the database level.

Using DDD’s bounded context technique, the Book bounded context can change its data (other than the product code and sale price) without it effecting the Orders bounded context, and the Orders code can’t change anything in the Books bounded context.

The benefits of DDD over a non-DDD approach are:

  • Protected business rules: The entity classes methods contain most of the business logic.
  • More obvious: entity classes containing methods with meaningful named to call.
  • Reduces complexity: Bounded context breaks an app into separate parts.

Links to more detailed information on DDD

Setting the scene – the application and the time pressure

In 2020 I was updating my book “Entity Framework Core in Action” and my example application was an ASP.NET Core application that sells books called the Book App. In the early chapters the Book App is very simple, as I am describing the basics of EF Core. But in the last section I build a much more complex Book App that I progressively performance tuned, starting with 700 books, then 100,000 books and finally ½ million books. For the Book App to perform well it through three significant enhancement stages.  Here is example of Book App features and display with four different ways to display the books to compare their performance.

At the same time, I was falling behind on getting the book finished. I had planned to finish all the chapters by the end of November 2020 when EF Core 5 was due out. But I only started the enhanced Book App in August 2020 so with 6 chapters to write I was NOT going to finish the book in November. So, the push was on to get things done! (In the end I finished writing the book just before Christmas 2020).

NOTE: The ASP.NET Core application I talk about in this article is available on GitHub. It is in branch Part3 of the repo https://github.com/JonPSmith/EfCoreinAction-SecondEdition and can be run locally.

My experience of each architectural approach

As well as experiencing each architectural approach while upgrade the application (what Neal Ford calls evolutionary architecture) I also had the extra experience of building the Book App under a serious time pressure. This type of situation is a great for learning whether the approaches worked or not, which is what I am now going to describe. But first here is a summary for you:

  1. Modular Monolith = Brilliant. First time I had used it and I will use it again!
  2. DDD = Love it: Used it for years and its really good.

Here are more details on each of these three approaches where I point out:

  1. What was good?
  2. What was bad?
  3. How did it fair under time pressure?

1. Modular Monolith

I was already aware of the modular monolith approach, but I hadn’t used this approach in application before. My experience was it worked really well: in fact, it was much better than I thought it would be. I would certainly use this approach again on any medium to large application.

1a. Modular Monolith – what was good?

The first good thing is how the modular monolith compared with the traditional “N-Layer” architecture (shortened to layered architecture). I have used the layered architecture approach many times, usually with four projects, but the modular monolith application has 22 projects, see figure to the right.

This means I know the code in each project is doing one job, and there are only links to other projects that are relevant to a project. That makes it easier to find and understand the code.

Also, I’m much more inclined to refactor the old code as I’m less likely to break something else. In contrast the layered architecture on its own has lots of different features in one project and I can’t be sure what its linked to, which makes me more declined refactor code as it might affect other code.

1b. Modular Monolith – what was bad?

Nothing was really wrong with the modular monolith but working out the project naming convention took some time, but it was super important (see the list of project in the figure above). By having the right naming convention, the name told me a) where in the layered architecture the project was in and b) the end of the name told me what it does. If you are going to try using a modular monolith approach, I recommend you think carefully about your naming convention.

I didn’t change name too often because of a development tool issue (Visual Studio) as you could rename the project, but the underlying folder name wasn’t changed, which makes the GitHub/folder display look wrong. The few I did change required me to rename the project, and then outside Visual Studio rename the folder and then hand-editing the solution file, which is horrible!

NOTE: I have since found a really nice tool that will do a project/folder rename and updates the solution/csproj files on applications that use Git for source control.

Also, I learnt to not end project name with the name of a class e.g., Book class, as that caused problems if you referred to the Book class in that project. That’s why you see projects ending with “s” e.g., “…Books” and “…Orders”.

1c. Modular Monolith – how did it fair under time pressure?

It certainly didn’t slow me down (other the time deliberating over the project naming convention!) and it might have made me a bit faster than the layered architecture because I knew where to go to find the code. If I had come back to work on the app after a few months, then it would be much quicker to find the code I am looking for and it will be easier to change without effecting other features.

I did find me breaking the rules at one point because I was racing to finish the book. The code ended up with 6 different ways to query the database for the book display, and there were a few common parts, like some of the constants used in the sort/filter display and one DTO. Instead of creating a BookApp.ServiceLayer.Common.Books project I just referred to the first project. That’s my bad, but it shows that while the modular monolith approach really helps separate the code, but it does rely on the developers following the rules.

NOTE: I had to go back to the Book App to add a feature to two of the book display query so I took the opportunity to create a project called BookApp.ServiceLayer.DisplayCommon.Books which holds all the common code. That has removed the linking between each query feature and made it clear what code is shared.

2. Domain-Driven Design

I have used DDD many years and I find it an excellent approach which is focuses on the business (domain) issues rather that the technical aspects of the application. Because I have used DDD so much it was second nature to me, but I try to define what works, didn’t work etc. in the Book App, and some feedback from working on client’s applications.

2a. Domain-Driven Design – what was good?

DDD is so massive I am going to only talk about one of the key aspects I use every day – that is the entity class. DDD says your entity classes should be in complete control over the data inside it, and its direct relationships. I therefore make all the business classes and the classes mapped to the database to have read-only properties and use constructors and methods to create or update the data inside.

Making the entity’s properties are read-only means your business logic/validation must be in the entity too. This means:

  • You know exactly where to look for the business logic/validation is – its in the entity with its data.
  • You change an entity by calling a appropriately named method in the entity, e.g. AddReview(…). That makes it crystal clear what you are doing and what parameters it needs.
  • The read-only aspect means you can ONLY update the data via the entity’s method. This is DRY, but more importantly its obvious where to find the code.
  • Your entity can never be in an invalid state because the constructors / methods will check the create/update and return an error if it would make the entity’s data invalid.

Overall, this means using a DDD entity class is so much clearer to create and update than changing a few properties in a normal class. I love the clarity that the names methods provide – its obvious what I am doing.

I already gave you an example of the Books and Orders bounded contexts. That worked really well and was easy to implement once a understood how to use EF Core to map a class to a table as if it was a SQL view.

2b. Domain-Driven Design – what was bad?

One of the downsides of DDD is you have to write more code. The figure below shows this by  comparing a normal (non-DDD) approach on the left against a DDD approach on the right in an ASP.NET Core/EF Core application.

Its not hard to see that you need to write more code, as the non-DDD version on the left is shorter. It might only be five or ten lines, but that mounts up pretty quickly when you have a real application with lots of entity classes and methods.

Having been using DDD for a long time I have built a library called EfCore.GenericServices, which reduces the amount of code you have to write. It does this replacing the repository and the method call with a service. You still have to write the method in the entity class, but that library reduces the rest of the code to a DTO / ViewModel and a service call. The core below how you would use the EfCore.GenericServices library in an ASP.NET Core action method.

[HttpPost]
[ValidateAntiForgeryToken]
public async Task<IActionResult> AddBookReview(AddReviewDto dto, 
    [FromServices] ICrudServicesAsync<BookDbContext> service)
{
    if (!ModelState.IsValid)
    {
        return View(dto);
    }
    await service.UpdateAndSaveAsync(dto);

    if (service.IsValid)
        //Success path
        return View("BookUpdated", service.Message);

    //Error path
    service.CopyErrorsToModelState(ModelState, dto);
    return View(dto);
}

Over the years the EfCore.GenericServices library has saved me a LOT of development time. It can’t do everything but its great at handling all the simple Create, Update and Deleting (known as CUD) leaving me to work on more complex parts of the application.

2c. Domain-Driven Design – how did it fair under time pressure?

Yes, DDD does take a bit more time but its (almost) impossible to bypass the way DDD works because of the design. In the Book App it worked really well, but I did use an extra approach known as domain events (see this article about this approach) which made some of the business logic easier to implement.

I didn’t break any DDD rules in the Book App, but two projects I worked on both clients found calling DDD methods for every update didn’t work for their front-end. For instance, one client wanted to use JSON Patch to speed up the front-end (angular) development.

To handle this I came up with the hybrid DDD style where non-business properties are read-write, but data with business logic/validation has to go through methods calls. The hybrid DDD style is a bit of a compromise over DDD, but certainly speeded up the development for my clients.  In retrospect both projects worked well, but I do worry that the hybrid DDD does allow a developer in a hurry to make a property read-write when they shouldn’t. If every entity class is locked down, then no one can break the rules.

Conclusion

I always like to analyse any applications I work on after the project is finished. That’s because when I am still working on a project there is often pressure (often from myself) to “get it done”. By analysing a project after its finished or a key milestone is met, I can look back more dispassionately and see if there is anything to learn from the project.

My main take-away from building the Book App is that the Modular Monolith approach was very good, and I would use it again. The modular monolith approach provides small, focused projects. That means I know all the code in the project is doing one job, and there are only links to other projects that are relevant to a project. That makes it easier to understand, and I’m much more inclined to refactor the old code as I’m less likely to break something else.

I would say that the hardest part of using the Modular Monolith approach was working out the naming convention of the projects, which only goes to prove the quote “There are only two hard things in Computer Science: cache invalidation and naming things” is still true😊.

DDD is an old friend to me and while it needed a bit more lines of code written the result is a rock-solid design where every entity makes sure its data in a valid state. Quite a lot of the validation and simple business logic can live inside the entity class, but business logic that cuts across multiple entities or bounded context can be a challenge. I have an approach to handle that, but I also used a new feature I learnt from a client’s project about a year ago and that helped too.

I hope this article helps you consider these two architectural approaches, or if you are using these approaches, they might spark some ideas. I’m sure many people have their own ideas or experiences so please do leave comments as I’m sure there is more to say on this subject.

Happy coding.

How to update a database’s schema without using EF Core’s migrate feature

Last Updated: January 27, 2021 | Created: January 26, 2021

This article is aimed at developers that want to use EF Core to access the database but want complete control over their database schema. I decided to write this article after seeing the EF Core Community standup covering the EF Core 6.0 Survey Results. In that video there was a page looking at the ways people deploy changes to production (link to video at that point), and quite a few respondents said they use SQL scripts to update their production database.

It’s not clear if people create those SQL scripts themselves or use EF Core’s Migrate SQL scripts, but Marcus Christensen commented during the Community standup (link to video at that point) said “A lot of projects that I have worked on during the years, has held the db as the one truth for everything, so the switch to code first is not that easy to sell.

To put that in context he was saying that some developers want to retain control of the database’s schema and have EF Core match the given database. EF Core can definitely do that, but in practice it gets a bit more complex (I know because I have done this on real-world applications).

TL;DR – summary

  • If you have a database that isn’t going to change much, then EF Core’s reverse engineering tool can create classes to map to the database and the correct DbContext and configurations.
  • If you are change the database’s schema as the project progresses, then the reverse engineering on its own isn’t such a good idea. I cover three approaches to cover this:
    • Option 0: Have a look at the EF Core 5’s improved migration feature to check it work for you – I will save you time if it can work for your project.
    • Option 1: Use Visual Studio’s extension called EF Core Power Tools. This is reverse engineering on steroids and is designed for repeated database’s schema changes.
    • Option 2: Use the EfCore.SchemaCompare library. This lets you to write EF Core code and update database schema manually and tells you where they differ.  

Setting the scene – what are the issues of updating your database schema yourself?

If you have a database that isn’t changing, then EF Core’s reverse engineering tool as a great fit. This reads your SQL database and creates the classes to map to the database (I call these classes entity classes) and a class you use to access the database, with EF Core configurations/attributes to define things in the EF Core to match your database.

That’s fine for a fixed database as you can take the code the reverse engineering tool output and edit it to work the way you want it to. You can (carefully) alter the code that the reverse engineering tool produces to get the best out of EF Core’s features, like Value Converters, Owned type, Query Filters and so on.

The problems come if you are enhancing the database as the project progresses, because the EF Core’s reverse engineering works, but some things aren’t so good:

  1. The reverse engineering tool has no way to detect useful EF Core features, like Owned Types, Value Converters, Query Filters, Table-per-Hierarchy, Table-per-Type, table splitting, and concurrency tokens, which means you need to edit the entity classes and the EF Core configurations.
  2. You can’t edit the entity classes or the DbContext class because you will be replacing them the next time you change your database. One way around this is to add to the entity classes with another class of the same name – that works because the entity classes are marked as partial.
  3. The entity classes have all the possible navigational relationships added, which can be confusing if some navigational relationships would typically not be added because of certain business rules. Also, you can’t change the entity classes to follow a Domain-Driven Design approach.
  4. A minor point, you need to type in the reverse engineering command, which can be long, every time. I only mention because Option 1 will solve that for you.

So, if you want to have complete control over your database you have a few options, one of which I created (see Option 2). I start with a non-obvious approach considering the title of this article – that is using EF Core to create a migration and tweaking it. I think its worth a quick look at this to make sure you’re not taking on more work than need to – simply skip Option 0 if you are sure you don’t want to use EF Core migrations.

Option 0: Use EF Core’s Migrate feature

I have just finished updating my book Entity Framework Core in Action to EF Core 5 and I completely rewrote many of the chapters from scratch because there was so much change in EF Core (or my understanding of EF Core) – one of those complete rewrites was the chapter on handling database migrations.

I have say I wasn’t a fan of EF Core migration feature, but after writing the migration chapter I’m coming around to using the migration feature. Partly it was because I more experience on real-world EF Core applications, but also some of the new features like the MigrationBuilder.Sql() gives me more control of what the migration does.

The EF Core team want you to at least review, and possibly alter, a migration. Their approach is that the migration is rather like a scaffolded Razor Page (ASP.NET Core example), where it’s a good start, but you might want to alter it. There is a great video on EF Core 5 updated migrations and there was a discussion about this (link to video at the start of that discussion).

NOTE: If you decide to use the migration feature and manually alter the migration you might find Option 2 useful to double check your migration changes still matches EF Core’s Model of the database.

So, you might like to have a look at EF Core migration feature to see if it might work for you. You don’t have to change much in the way you apply the SQL migration scripts, as EF Core team recommends having the migration be apply by scripts anyway.

Option 1: Use Visual Studio’s extension called EF Core Power Tools

In my opinion, if you want to reverse engineer multiple times, then you should use Erik Ejlskov Jensen (known as @ErikEJ on GitHub and Twitter) Visual Studio’s extension called EF Core Power Tools. This allows you to run EF Core’s reverse engineering service via a friendly user interface, but that’s just the start. It provides a ton of options, some not even EF Core’s reverse engineering like reverse engineering SQL stored procs. All your options are stored, which makes subsequent reverse engineering just select and click. The extension also has lots other features creating a diagram of your database based on what your entity classes and EF Core configuration.  

I’m not going to detail all the features in the EF Core Power Tools extension because Erik has done that already. Here are two videos as a starting point, plus a link to the EF Core Power Tools documentation.

So, if you are happy with the general type of output that reverse engineering produces, then the EF Core Power Tools extension is a very helpful tool with extra features over the EF Core reverse engineering tool. EF Core Power Tools also specifically designed for continuous changes to the database, and Erik used it that way in the company he was working for.

NOTE: I talked to Erik and he said they use a SQL Server database project (.sqlproj) to keep the SQL Server schema under source control, and the resulting SQL Server .dacpac files to update the database and EF Core Power Tools to update the code. See this article for how Erik does this.

OPTION 2: Use the EfCore.SchemaCompare library

The problem with any reverse engineering approach is that you aren’t fully in control of the entity classes and the EF Core features. Just as developers want complete control over the database, I also want complete control of my entity classes and what EF Core features a can use. As Jeremy Likness said on the EF Core 6.0 survey video when database-first etc were being talked about  “I want to model the domain properly and model the database property and then use (EF Core) fluent API to map to two together in the right way” (link to video at that point).

I feel the same, and I built a feature I refer to as EfSchemaCompare – the latest version of this (I have version going back to EF6!) is in the repo https://github.com/JonPSmith/EfCore.SchemaCompare. This library compares EF Core’s view of the database based on the entity classes and the EF Core configuration against any relational database that EF Core supports. That’s because, like EF Core Power Tools, I use EF Core’s reverse engineering service to read the database, so no extra coding for me to do.

This library allows me (and you) to create my own SQL scripts to update the database while using any EF Core feature I need in my code. I can then run the EfSchemaCompare tool and it tells me if my EF Core code matches the database. If they don’t it gives me detailed errors so that I can fix either the database or the EF Core code. Here is a simplified diagram on how EfSchemaCompare works.

The plus side of this is I can write my entity classes any way I like, and normally I use a DDD pattern for my entity classes. I can also use many of the great EF Core features like Owned Types, Value Converters, Query Filters, Table-per-Hierarchy, table splitting, and concurrency tokens in the way I want to. Also, I control the database schema – in the past I have created SQL scripts and applied them to the database using DbUp.

The downside is I have to do more work. The reverse engineering tool or the EF Core migrate feature could do part of the work, but I have decided I want complete control over the entity classes and the EF Core features I use. As I said before, I think the migration feature (and documentation) in EF Core 5 is really good now, but for complex applications, say working with a database that has non-EF Core applications accessing it, then the EfSchemaCompare tool is my go-to solution.

The README file in the EfCore.SchemaCompare repo contains all the documentation on what the library checks and how to call it. I typically create a unit test to check a database – there are lots of options to allow you to provide the connection string of the database you want to test against your entity classes and EF Core configuration provided by your application’s DbContext class.

NOTE: The EfCore.SchemaCompare library only works with EF Core 5. There is a version in the EfCore.TestSuport library version 3.2.0 that works with EF Core 2.1 and EF Core 3.? and the documentation for that can be found in the Old-Docs page. This older version has more limitations than the latest EfCore.SchemaCompare version.

Conclusion

So, if you, like Marcus Christensen, consider “the database as the one truth for everything”, then I have described two (maybe three) options you could use. Taking charge of your database schema/update is a good idea, but it does mean you have to do more work.

Using EF Core’s migration tool, with the ability to alter the migration is the simplest, but some people don’t like that. The reverser engineering/EF Core Power Tools is the next easiest, as it will write the EF Core code for you. But if you want to really tap into EF Core’s features and/or DDD, then these approaches don’t cut it. That’s why I created the many versions of the EfSchemaCompare library.

I have used the EfSchemaCompare library on real-world applications, and I have also worked on client projects that used EF Core’s migration feature. The migration feature is much simpler but sometimes it’s too easy, which means you don’t think enough about what the best schema for your database would be. But that’s not the problem of the migration feature, its our desire/need to quickly move on, because you can change any migration EF Core produces if you want to.

I hope this article was useful to you on your usage of EF Core. Let me know your thoughts on this in the comments.

Happy coding.

Using ValueTask to create methods that can work as sync or async

Last Updated: January 25, 2021 | Created: January 23, 2021

In this article I delve into C#’s ValueTask struct, which provides a subset of the Task class features, and use it’s features to solve a problem of building libraries that need both sync and async version of the library’s methods. Along the way I learnt something about ValueTask and how it works with sync code.

NOTE: Thanks to Stephen Toub, who works on the Microsoft NET platform and wrote the article “Understanding the Whys, Whats, and Whens of ValueTask”, for confirming this is a valid approach and is used inside Microsoft’s code. His feedback, plus amoerie’s comment, helped me to improve the code to return the correct stack trace.

TL;DR – summary

  • Many of my libraries provide a sync and async version of each method. This can cause me to have to duplicate code, one for the sync call and one for the async call, with just a few different calls, e.g. SaveChanges and SaveChangesAsync
  • This article tells you how the ValueTask (and ValueTask <TResult>) works when it returns without running an async method, and what its properties mean. I also have some unit tests to check this.
  • Using this information, I found a way to use C#’s ValueTask to build a single method work as sync or async method, which is selected by a book a parameter. This removes a lot of duplicate code.
  • I have built some extension methods that will check that the returned ValueTask a) didn’t use an async call, and b) if an exception was thrown in the method (which won’t bubble up) it then throws it so that it does bubble up.

Setting the scene – why I needed methods to work sync or async

I have built quite a few libraries, NuGet says I have 15 packages, and most are designed to work with EF Core (a few are for EF6.x). Five of these have both sync and async versions of the methods to allow the developer to use it whatever way they want to. This means I have to build some methods twice: one for sync and one for async, and of course that leads to duplication of code.

Normally I can minimise the duplication by building internal methods that return IQueryable<T>, but when I developed the EfCore.GenericEventRunner library I wasn’t querying the database but running sync or async code provided by the developer. The internal methods normally have lots of code with one or two methods that could be sync or async, e.g. SaveChanges and SaveChangesAsync.

Ideally, I wanted internal methods that I could call that sync or async, where a parameter told it whether to call sync or async, e.g.

  • SYNC:   var result = MyMethod(useAsync: false)
  • ASYNC: var result = await MyMethod(useAsync: true)

I found the amazingly good article by Stephen Toub called “Understanding the Whys, Whats, and Whens of ValueTask” which explained about ValueTask <TResult> and synchronous completion, and this got me thinking – can I use ValueTask to make a method that could work sync and async? And I could! Read on to see how I did this.

What happens when a ValueTask has synchronous completion?

The ValueTask (and ValueTask <TResult>) code is complex and linked to the Task class, and the documentation is rather short on explaining what an “failed operation”. But from lots of unit tests and inspecting the internal data I worked out what happens with a sync return.

The ValueTask (and ValueTask <TResult>) have four bool properties. They are:

  • IsCompleted: This is true if the ValueTask is completed. So, if I captured the ValueTask, and this was true, then it had finished with means I don’t have to await it.
  • IsCompletedSuccessfully: This is true if no error happened. In a sync return it means no exception has been thrown.
  • IsFaulted: This is true if there was an error, and for a sync return that means an exception.
  • IsCancelled: This is true the CancellationToken cancelled the async method. This is not used in a sync return.

From this information I decided I could check that a method had synchronously if the IsCompleted property is true.

The next problem was what to do when a method using ValueTask throws an exception. The exception isn’t bubbled up but is held inside the ValueTask so I needed to extract that exception to throw it. I bit more unit testing and inspecting the ValueTask internals showed me how to extract the exception and throw it. Information provided by Stephen Toub showed a better way to throw the exception with the correct stacktrace.

NOTE: You can see the unit tests I did to detect what ValueTask and ValueTask <TResult> here.

So I could my var valueTask =MyMethod(useAsync: false) method and inspect the valueTask returned to check it didn’t call any async methods inside it, and calls GetResult, which will throw an exception if there is one. The code below does this for a ValueTask (this ValueTaskSyncCheckers class also contains a similar method for ValueTask<TResult>).

This code comes from from Microsoft code which this approach is used (look at this code and search for “useAsync: false”). Stephen Toub told me The valueTask.GetAwaiter().GetResult(); is the best way to end an ValueTask, even for the version that doesn’t return a result.. That’s because:

  • If there was an exception, then that call will throw the exception inside the method with the correct stacktrace.
  • Stephen Toub said that it should call GetResult even in the version with no result as if your method is used in a pooled resource, that call it typically used to tell the pooled resource is no longer used.

The listing below shows the two versions of the CheckSyncValueTaskWorked methods – the first is for ValueTask and the second for ValueTask<TResult>.

public static void CheckSyncValueTaskWorked(
    this ValueTask valueTask)
{
    if (!valueTask.IsCompleted)
        throw new InvalidOperationException(
            "Expected a sync task, but got an async task");
    valueTask.GetAwaiter().GetResult();
}

public static TResult CheckSyncValueTaskWorkedAndReturnResult
    <TResult>(this ValueTask<TResult> valueTask)
{
    if (!valueTask.IsCompleted)
        throw new InvalidOperationException(
             "Expected a sync task, but got an async task");
    return valueTask.GetAwaiter().GetResult();
}

NOTE: You can access these extension methods via this link.

How I used this feature in my libraries

I first used this in my EfCore.GenericEventRunner library, but those examples are complex, so I show a simple example in my EfCore.SoftDeleteServices, which has a very simple example. Here is a method that uses the useAsync property – see the highlighted lines at the end of the code.

public static async ValueTask<TEntity> LoadEntityViaPrimaryKeys<TEntity>(this DbContext conte
    Dictionary<Type, Expression<Func<object, bool>>> otherFilters, 
    bool useAsync,
    params object[] keyValues)
    where TEntity : class
{
    // Lots of checks/exceptions left out 

    var entityType = context.Model.FindEntityType(typeof(TEntity));
    var keyProps = context.Model.FindEntityType(typeof(TEntity))
        .FindPrimaryKey().Properties
        .Select(x => x.PropertyInfo).ToList();

    var filterOutInvalidEntities = otherFilters
          .FormOtherFiltersOnly<TEntity>();
    var query = filterOutInvalidEntities == null
        ? context.Set<TEntity>().IgnoreQueryFilters()
        : context.Set<TEntity>().IgnoreQueryFilters()
            .Where(filterOutInvalidEntities);

    return useAsync
        ? await query.SingleOrDefaultAsync(
              CreateFilter<TEntity>(keyProps, keyValues))
        : query.SingleOrDefault(
              CreateFilter<TEntity>(keyProps, keyValues));
}

The following two versions – notice the sync takes the ValueTask and then calls the CheckSyncValueTaskWorked  method, while the async uses the normal async/await approach.

SYNC VERSION

var entity= _context.LoadEntityViaPrimaryKeys<TEntity>(
    _config.OtherFilters, false, keyValues)
    .CheckSyncValueTaskWorkedAndReturnResult();
if (entity == null)
{
    //… rest of code left out

ASYNC VERSION

var entity = await _context.LoadEntityViaPrimaryKeys<TEntity>(
     _config.OtherFilters, true, keyValues);
if (entity == null) 
{
    //… rest of code left out

NOTE: I generally create the sync version of a library first, as its much easier to debug because async exception stacktraces are hard to read and the debug data can be harder to read. Once I have the sync version working, with its unit tests, then I build the async side of the library.

Conclusion

So, I used this sync/async approach in my EfCore.GenericEventRunner library, where the code is very complex, and it really made the job much easier. I then used the same approach in EfCore.SoftDeleteServices library – again there was a complex class called CascadeWalker, that “walks” the dependant navigational properties. Both of this approach stopped a significant duplication of code.

You might not be building a library, but you have learnt how the ValueTask does when it returns a sync result to an async call. The ValueType is there to make the sync return faster, and especially memory usage. Also, you now have another approach if you have a similar sync/async need.

NOTE:  ValueTask has a number of limitations so I only use ValueType in my internal parts of my libraries and provide a Task version to the user of my libraries.  

In case you missed it do read the excellent article “Understanding the Whys, Whats, and Whens of ValueTask” which explained ValueTask. And thanks again to Stephen Toub and amoerie’s comment for improving the solution.

Happy coding.

New features for unit testing your Entity Framework Core 5 code

Last Updated: January 21, 2021 | Created: January 21, 2021

This article is about unit testing applications that use Entity Framework Core (EF Core), with the focus on the new approaches you can use in EF Core 5. I start out with an overview of the different ways of unit testing in EF Core and then highlight some improvements brought in EF Core 5, plus some improvements I added to the EfCore.TestSupport version 5 library that is designed to help unit testing of EF Core 5.

NOTE: I am using xUnit in my unit tests. xUnit is well supported in NET and widely used (EF Core uses xUnit for its testing).

TL;DR – summary

NOTE: In this article I use the term entity class (or entity instance) to refer to a class that is mapped to the database by EF Core.

Setting the scene – why, and how, I unit test my EF Core code

I have used unit testing since I came back to being a developer (I had a long stint as a tech manager) and I consider unit testing one of the most useful tools to produce correct code. And when it comes to database code, which I write a lot of, I started with a repository pattern which is easy to unit test, but I soon moved away from the repository patten to using Query Objects. At that point had to work out how to unit test my code that uses the database. This was critical to me as I wanted my tests to cover as much of my code as possible.

In EF6.x I used a library called Effort which mocks the database, but when EF Core came out, I had to find another way. I tried EF Core’s In-Memory database provider, but the EF Core’s SQLite database provider using an in-memory database was MUCH better. The SQLite in-memory database (I cover this later) very easy to use for unit tests, but has some limitations so sometimes I have to use a real database.

The reason why I unit test my EF Core code is to make sure they work as I expected. Typical things that I am trying to catch.

  • Bad LINQ code: EF Core throws an exception if can’t translate my LINQ query into database code. That will make the unit test fail.
  • Database write didn’t work. Sometimes my EF Core create, update or delete didn’t work the way I expected. Maybe I left out an Include or forgot to call SaveChanges. By testing the database

While some people don’t like using unit tests on a database my approach has caught many, many errors which would be been hard to catch in the application. Also, a unit test gives me immediate feedback when I’m writing code and continues to check that code as I extend and refactor that code.

The three ways to unit test your EF Core code

Before I start on the new features here is a quick look at the three ways you can unit test your EF Core code. The figure (which comes from chapter 17 on my book “Entity Framework Core in Action, 2nd edition”) compares three ways to unit test your EF Core code, with the pros and cons of each.

As the figure says, using the same type of database in your unit test as what your application uses is the safest approach – the unit test database accesses will respond just the same as your production system. In this case, why would I also list using an SQLite in-memory database in unit tests? While there are some limitations/differences from other databases, it does have many positives:

  1. The database schema is always up to date.
  2. The database is empty, which is a good starting point for a unit test.
  3. Running your unit tests in parallel works because each database is held locally in each test.
  4. Your unit tests will run successfully in the Test part of a DevOps pipeline without any other settings.
  5. Your unit tests are faster.

NOTE: Item 3, “running your unit tests in parallel”, is an xUnit feature which makes running unit tests much quicker. It does mean you need separate databases for each unit test class if you are using a non in-memory database. The library EfCore.TestSupport has features to obtain unique database names for SQL Server databases.

The last option, mocking the database, really relies on you using some form pf repository pattern. I use this for really complex business logic where I build a specific repository pattern for the business logic, which allows me to intercept and mock the database access – see this article for this approach.

The new features in EF Core 5 that help with unit testing

I am going to cover four features that have changes in the EfCore.TestSupport library, either because of new features in EF Core 5, or improvements that have been added to the library.

  • Creating a unique, empty test database with the correct schema
  • How to make sure your EF Core accesses match the real-world usage
  • Improved SQLite in-memory options to dispose the database
  • How to check the SQL created by your EF Core code

Creating a unique, empty test database with the correct schema

To use a database in a xUnit unit test it needs to be:

  • Unique to the test class: that’s needed to allow for parallel running of your unit tests
  • Its schema must match the current Model of your application’s DbContext: if the database schema is different to what EF Core things it is, then your unit tests aren’t working on the correct database.
  • The database’s data should be a known state, otherwise your unit tests won’t know what to expect when reading the database. An empty database is the best choice.

Here are three ways to make sure database fulfils these three requirements.

  1. Use an SQLite in-memory which is created every time.
  2. Use a unique database name, plus calling EnsureDeleted, and then EnsureCreated.
  3. Use unique database name, plus call the EnsureClean method (only works on SQL Server).

1. Use an SQLite in-memory which is created every time.

The SQLite database has a in-memory mode, which is applied by setting the connection string to “Filename=:memory:”. The database is then hidden in the connection string, which makes it unique to its unit test and its database hasn’t been created yet. This is quick and easy, but you production database uses another type of database then it might not work for you.

The EF Core documentation on unit testing shows one way to set up a in-memory SQLite database, but I use the EfCore.TestSupport library’s static method called SqliteInMemory.CreateOptions<TContext> that will setup the options for creating an SQLite in-memory database, as shown in the code below.

[Fact]
public void TestSqliteInMemoryOk()
{
    //SETUP
    var options = SqliteInMemory.CreateOptions<BookContext>();
    using var context = new BookContext(options);

    context.Database.EnsureCreated();

    //Rest of unit test is left out
}

The database isn’t created at the start, so you need to call EF Core’s EnsureCreated method at the start. This means you get a database that matches the current Model of your application’s DbContext.

2. Unique database name, plus Calling EnsureDeleted, then EnsureCreated

If you are use a normal (non in-memory) database, then you need to make sure the database has a unique name for each test class (xUnit runs test classes in parallel, but methods in a test class are run serially). To get a unique database name the EfCore.TestSupport has methods that take a base SQL Server connection string from an appsetting.json file and adds the class name to the end of the current name (see code after next paragraph, and this docs).

That solves the unique database name, and we solve the “matching schema” and “empty database” by calling the EnsureDeleted method, then the EnsureCreated method. These two methods will delete the existing database and create a new database whose schema will match the EF Core’s current Model of the database. The EnsureDeleted / EnsureCreated approach works for all databases but is shown with SQL Server here.

[Fact]
public void TestEnsureDeletedEnsureCreatedOk()
{
    //SETUP
    var options = this.CreateUniqueClassOptions<BookContext>();
    using var context = new BookContext(options);
    
    context.Database.EnsureDeleted();
    context.Database.EnsureCreated();

    //Rest of unit test is left out
}

The EnsureDeleted / EnsureCreated approach used to be very slow (~10 seconds) for a SQL Server database, but since the new SqlClient came out in NET 5 this is much quicker (~ 1.5 seconds), which makes a big difference to how long a unit test would take to run when using this EnsureDeleted + EnsureCreated version.

3. Unique database name, plus call the EnsureClean method (only works on SQL Server).

While asking some questions on the EFCore GitHub Arthur Vickers described a method that could wipe the schema of an SQL Server database.  This clever method removed the current schema of the database by deleting all the SQL indexes, constraints, tables, sequences, UDFs and so on in the database. It then, by default, calls EnsuredCreated method to return a database with the correct schema and empty of data.

The EnsureClean method is deep inside EF Core’s unit tests, but I extracted that code and build the other parts needed to make it useful and it is available in version 5 of the EfCore.TestSupport library. The following listing shows how you use this method in your unit test.

[Fact]
public void TestSqlDatabaseEnsureCleanOk()
{
    //SETUP
    var options = this.CreateUniqueClassOptions<BookContext>();
    using var context = new BookContext(options);
    
    context.Database.EnsureClean();

    //Rest of unit test is left out
}

EnsureClean approach is a faster, maybe twice as fast as the EnsureDeleted + EnsureCreated version, which could make a big difference to how long your unit tests take to run. It also better in situations where your database server won’t allow you to delete or create new databases but does allow you to read/write a database, for instance if your test databases were on SQL Server where you don’t have admin privileges.

How to make sure your EF Core accesses match the real-world usage

Each unit test is a single method that has to a) setup the database ready for testing, b) runs the code you are testing, and the final part, c) checks that the results of the code you are testing are correct. And the middle part, run the code, must reproduce the situation in which the code you are testing is normally used. But because all three parts are all in one method it can be difficult to create the same state that the test code is normally used in.

The issue of “reproducing the same state the test code is normally used in” is common to unit testing, but when testing EF Core code this is made more complicated by the EF Core feature called Identity Resolution. Identity Resolution is critical in your normal code as it makes sure you only have one entity instance of a class type that has a specific primary key (see this example). The problem is that Identity Resolution can make your unit test pass even when there is a bug in your code.

Here is a unit test that passes because of Identity Resolution. The test of the Price at the end of the unit test should fail, because SaveChanges wasn’t called (see line 15). The reason it passed is because the entity instance in the variable called verifyBook was read from the database, but because the tracked entity instances inside the DbContext was found, and that was returned instead of reading from the database.

[Fact]
public void ExampleIdentityResolutionBad()
{
    //SETUP
    var options = SqliteInMemory
        .CreateOptions<EfCoreContext>();
    using var context = new EfCoreContext(options);

    context.Database.EnsureCreated();
    context.SeedDatabaseFourBooks();

    //ATTEMPT
    var book = context.Books.First();
    book.Price = 123;
    // Should call context.SaveChanges()

    //VERIFY
    var verifyBook = context.Books.First();
    //!!! THIS IS WRONG !!! THIS IS WRONG
    verifyBook.Price.ShouldEqual(123);
}

In the past we fixed this with multiple instances of the DbContext, as shown in the following code

public void UsingThreeInstancesOfTheDbcontext()
{
    //SETUP
    var options = SqliteInMemory         
        .CreateOptions<EfCoreContext>(); 
    options.StopNextDispose();
    using (var context = new EfCoreContext(options)) 
    {
        //SETUP instance
    }
    options.StopNextDispose();   
    using (var context = new EfCoreContext(options)) 
    {
        //ATTEMPT instance
    }
    using (var context = new EfCoreContext(options)) 
    {
        //VERIFY instance
    }
}

But there is a better way to do this with EF Core 5’s ChangeTracker.Clear method. This method quickly removes all entity instances the DbContext is currently tracking. This means you can use one instance of the DbContext, but each stage, SETUP, ATTEMPT and VERIFY, are all isolated which stops Identity Resolution from giving you data from another satge. In the code below there are two potential errors that would slip through if you didn’t add calls to the ChangeTracker.Clear method (or used multiple DbContexts).

  • Line 15: If the Include was left out the unit test would still pass (because the Reviews collection was set up in the SETUP stage).
  • Line 19: If the SaveChanges was left out the unit test would still pass (because the VERIFY read of the database would have bee given the book entity from the ATTEMPT stage)
public void UsingChangeTrackerClear()
{
    //SETUP
    var options = SqliteInMemory
        .CreateOptions<EfCoreContext>();
    using var context = new EfCoreContext(options);

    context.Database.EnsureCreated();             
    var setupBooks = context.SeedDatabaseFourBooks();              

    context.ChangeTracker.Clear();                

    //ATTEMPT
    var book = context.Books                      
        .Include(b => b.Reviews)
        .Single(b => b.BookId = setupBooks.Last().BookId);           
    book.Reviews.Add(new Review { NumStars = 5 });

    context.SaveChanges();                        

    //VERIFY
    context.ChangeTracker.Clear();                

    context.Books.Include(b => b.Reviews)         
        .Single(b => b.BookId = setupBooks.Last().BookId)            
        .Reviews.Count.ShouldEqual(3);            
}

This is much better than the three separate DbContext instances because

  1. You don’t have to create the three DbContext’s scopes (saves typing. Shorter unit test)
  2. You can use using var context = …, so no indents (nicer to write. Easier to read)
  3. You can still refer to previous parts, say to get its primary key (see use of setupBooks on line 16 and 25)
  4. It works better with the improved SQLite in-memory disposable options (see next section)

Improved SQLite in-memory options to dispose the database

You have already seen the SqliteInMemory.CreateOptions<TContext> method earlier but in version 5 of the EfCore.TestSupport library I have updated it to dispose the SQLite connection when the DbContext is disposed. The SQLite connection holds the in-memory database so disposing it makes sure that the memory used to hold the database is released.

NOTE: In previous versions of the EfCore.TestSupport library I didn’t do that, and I haven’t had any memory problems. But the EF Core docs say you should dispose the connection, so I updated the SqliteInMemory options methods to implement the IDisposable interface.

It turns out the disposing of the DbContext instance will dispose the options instance, which in turn disposes the SQLite connection. See the comments at the end of the code.

[Fact]
public void TestSqliteInMemoryOk()
{
    //SETUP
    var options = SqliteInMemory.CreateOptions<BookContext>();
    using var context = new BookContext(options);

    //Rest of unit test is left out
} // context is disposed at end of the using var scope,
  // which disposes the options that was used to create it, 
  // which in turn disposes the SQLite connection

NOTE: If you use multiple instances of the DbContext based on the same options instance, then you need to use one of these approaches to delay the dispose of the options until the end of the unit test.

How to check the SQL created by your EF Core code

If I’m interested in the performance of some part of the code I am working on, it often easier to look at the SQL commands in a unit test than in the actual application. EF Core 5 provides two ways to do this:

  1. The AsQueryString method to use on database queries
  2. Capturing EF Core’s logging output using the LogTo method

1. The AsQueryString method to use on database queries

The AsQueryString method will turn an IQueryable variable built using EF Core’s DbContext into a string containing the database commands for that query. Typically, I output this string the xUnit’s window so I can check it (see line 24). In the code below also contains a test so you can see the created SQL code – I don’t normally do that, but I put it in so you could see the type of output you get.

public class TestToQueryString
{
    private readonly ITestOutputHelper _output;

    public TestToQueryString(ITestOutputHelper output)
    {
        _output = output;
    }

    [Fact]
    public void TestToQueryStringOnLinqQuery()
    {
        //SETUP
        var options = SqliteInMemory.CreateOptions<BookDb
        using var context = new BookDbContext(options);
        context.Database.EnsureCreated();
        context.SeedDatabaseFourBooks();

        //ATTEMPT 
        var query = context.Books.Select(x => x.BookId); 
        var bookIds = query.ToArray();                   

        //VERIFY
        _output.WriteLine(query.ToQueryString());        
        query.ToQueryString().ShouldEqual(               
            "SELECT \"b\".\"BookId\"\r\n" +              
            "FROM \"Books\" AS \"b\"\r\n" +              
            "WHERE NOT (\"b\".\"SoftDeleted\")");        
        bookIds.ShouldEqual(new []{1,2,3,4});            
    }
}

2. Capturing EF Core’s logging output using the LogTo method

EF Core 5 makes it much easier to capture the logging that EF Core outputs (before you needed to create a ILoggerProvider class and register that with EF Core). But now you can add the LogTo method to your options and it will return a string output for every log. The code below shows how to do this, with the logs output to the xUnit’s window.

[Fact]
public void TestLogToDemoToConsole()
{
    //SETUP
    var connectionString = 
        this.GetUniqueDatabaseConnectionString();
    var builder =                                   
        new DbContextOptionsBuilder<BookDbContext>()
        .UseSqlServer(connectionString)             
        .EnableSensitiveDataLogging()
        .LogTo(_output.WriteLine);


    // Rest of unit test is left out

The LogTo has lots of different ways to filter and format the output (see the EF Core docs here). I created versions of the EfCore.TestSupport’s SQLite in-memory and SQL Server option builders to use LogTo, and I decided to use a class, called LogToOptions, to manage all the filters/formats for LogTo (the LogTo requires calls to different methods). This allowed me to define better defaults (defaults to: a) Information, not Debug, log level, b) does not include datetime in output) for logging output and make it easier for the filters/formats to be changed.

I also added a feature that I use a lot, that is the ability to turn on or off the log output. The code below shows the SQLite in-memory option builder with the LogTo output, plus using the ShowLog feature. This only starts output logging once the database has been created and seeded – see highlighted line 17 .

[Fact]
public void TestEfCoreLoggingCheckSqlOutputShowLog()
{
    //SETUP
    var logToOptions = new LogToOptions
    {
        ShowLog = false
    };
    var options = SqliteInMemory
         .CreateOptionsWithLogTo<BookContext>(
             _output.WriteLine);
    using var context = new BookContext(options);
    context.Database.EnsureCreated();
    context.SeedDatabaseFourBooks();

    //ATTEMPT 
    logToOptions.ShowLog = true;
    var book = context.Books.Single(x => x.Reviews.Count() > 1);

    //Rest of unit test left out
}

Information to existing user of the EfCore.TestSupport library

It’s no longer possible to detect the EF Core version via the netstandard so now it is done via the first number in the library’s version. For instance EfCore.TestSupport, version 5.?.? works with EF Core 5.?.?.* At the same time the library was getting hard to keep up to date, especially with EfSchemaCompare in it, so I took the opportunity to clean up the library.

BUT that clean up includes BREAKING CHANGES, mainly around SQLite in-memory option builder. If you use SqliteInMemory.CreateOptions you MUST read this document to decided whether you want to upgrade or not.

NOTE: You may not be aware, but your NuGet packages in your test projects override the same NuGet packages installed EfCore.TestSupport library. So, as long as you add the newest versions of the EF Core NuGet libraries, then the EfCore.TestSupport library will use those. The only part that won’t run is the EfSchemaCompare, but has now got its own library so you can use that directly.

Conclusion

I just read a great article on the stackoverfow blog which said

“Which of these bugs would be easier for you to find, understand, repro, and fix: a bug in the code you know you wrote earlier today or a bug in the code someone on your team probably wrote last year? It’s not even close! You will never, ever again debug or understand this code as well as you do right now, with your original intent fresh in your brain, your understanding of the problem and its solution space rich and fresh.

I totally agree and that’s why I love unit tests – I get feedback as I am developing (I also like that unit tests will tell me if I broke some old code too). So, I couldn’t work without unit tests, but at the same time I know that unit tests can take a lot of development time. How do I solve that dilemma?

My answer to the extra development time isn’t to write less unit tests, but to build a library and develop patterns that makes me really quick at unit testing. I also try to make my unit tests fast to run – that’s why I worry about how long it takes to set up the database, and why I like xUnit’s parallel running of unit tests.

The changes to the EfCore.TestSupport library and my unit test patterns due to EF Core 5 are fairly small, but each one reduces the number of lines I have to write for each unit test or makes the unit test easier to read. I think the ChangeTracker.Clear method is the best improvement because it does both – it’s easier to write and easier to read my unit tests.

Happy coding.

Updating many-to-many relationships in EF Core 5 and above

Last Updated: March 23, 2021 | Created: January 14, 2021

EF Core 5 added a direct many-to-many relationship with zero configuration (hurrah!). This article describes how to set this direct many-to-many relationship and why (and how) you might want to configure this direct many-to-many. I also include the original many-to-many relationship (referred to as indirect many-to-many from now on) where you need to create the class that acts as the linking table between the two classes.

You might be mildly interested that this is the third iteration of this article.  I wrote the first article on many-to-many on EF6.x in 2014, and another many-to-many article for EF Core in 2017. All of these got a lot of views, so I had to write a new article once EF Core 5 came out. I hope you find it useful.

All the information and the code come from Chapter 2 of the second edition of my book, Entity Framework Core in Action, which cover EF Core 5. In this book I build a book selling site, called Book App, where each book has two, many-to-many relationships:

  1. A direct many-to-many relationship to a Tag entity class (I refer to classes that EF Core maps to a database as entity classes). A Tag holds a category (for instance: Microsoft .NET or Web) which allows users to pick books by their topic.
  2. An indirect many-to-many relationship to Author entity class, which provides an ordered list of Author’s on the book, for instance: by Dino Esposito, Andrea Saltarello.

Here is an example of how it displays each book to the user – this is a fictitious book I used for many of my tests.

Overall summary and links to each section summary

For people who are in a hurry I have a ended each section with a summary. Here are the links to the summaries:

The overall summary is:

  • Direct many-to many relationships are super simple to configure and use, but by default you can’t access the linking table.
  • Indirect many-to many relationships takes more work to set up, but you can access the linking table. This allows you to put specific data in the linking table, such as an order in which you want to read them back.  

Here are the individual summaries (with links).

NOTE: All the code you see in this article comes the companion GitHub repo to my book Entity Framework Core in Action. Here is link to the directory with the entity classes are in, and many of code examples comes from the Ch03_ManyToManyUpdate unit test class and Ch03_ManyToManyCreate.

Setting the scene – the database and the query

Let’s start by seeing the finished database and how the query works. You can skip this, but maybe having an overall view of what is going on will help you when you are looking at the detailed part you are looking at the specific part you are interested in. Let’s start with the database.

This shows the two many-to-many – both have a linking table, but the direct many-to-many has its linking table created by EF Core.

Next, let’s see the many-to-many queries and how they relate to the book display in the figure below.

You can see that the Book’s Authors (top left) needs to be ordered – that Order property (a byte) is in the linking entity class. But for the Book’s Tags (bottom left), which don’t have an order, the query is much simpler to write because EF Core will automatically add the extra SQL needed to use the hidden linking table.

Now we get into the detail of setting up and using both of these types of many-to-many relationships.

Direct many-to-many setup – normal setup.

The setting up of the direct many-to-many relationship is done automatically (this is known as By Convention configuration in EF Core).  And when you create your database via EF Core, then it will add the linking table for you.

This is super simple to do – so much easier than the indirect many-to-many. But if you want to add extra data in the linking table, say for ordering or filtering, then you either alter the direct many-to-many or use the indirect many-to-many approach.

NOTE: The direct many-to-many relationship is only automatically configured if you have a collection navigational property on both ends. If you only want a navigational property on one end, then you will need to use the Fluent API configure (see next section), for instance …HasMany(x => x.Tags) .WithMany() where the Tags has no navigational property back to the Books.

Direct many-to-many setup: When you want to define the linking table

You can define an entity class and configure the linking table, but I will say that if you are going to do that you might as well use the indirect approach as I think it’s easier to set up and use.

Typically, you would only define the linking table if you wanted to add extra data. There are two steps in this process:

1. Creating a class to map to the linking table

Your entity class must have the two primary/foreign key from each ends of the many-to-many link, in this case it’s the BookId and the TagId. The code below defines the minimum properties to be the linking table – can add extra properties as normal, but I leave that you to do that.

public class BookTag
{
    public int BookId { get; set; }

    [Required]
    [MaxLength(40)]
    public string TagId { get; set; }

    //You can add extra properties here

    //relationships

    public Book Book { get; private set; }
    public Tag Tag { get; private set; }
} 

You could add properties such as the Order property needed for the Author ordering, or maybe a property to use for soft delete. That’s up to you and doesn’t affect the configuration step that comes next.

2. Configuring the linking table in the OnModelCreating method

Now you have to configure the many-to-many linking class/table with the UsingEntity method in the OnModelCreating method in your application’s DbContext, as shown in the code below.

public class EfCoreContext : DbContext
{
    //Other code left out to focus on many-to-many
 
    protected override OnModelCreating(ModelBuilder modelBuilder) 
    {
        //Other configuration left out to focus on many-to-many
 
        modelBuilder.Entity<Book>().HasMany(x => x.Tags)
                .WithMany(x => x.Books)
                .UsingEntity<BookTag>(
                    x => x.HasOne(x => x.Tag)
                    .WithMany().HasForeignKey(x => x.TagId),
                    x => x.HasOne(x => x.Book)
                   .WithMany().HasForeignKey(x => x.BookId));
    }
}

You can see the EF Core document on this here.

NOTE: I really recommend an excellent video produced by the EF Core team which has a long section on the new, direct many-to-many, including how to configure it to include extra data.

Direct many-to-many usage – querying

Querying the direct many-to-many relationships is quite normal. Here are some queries

  • Load all the Books with their Tags
    var books = context.Books.Include(b => b.Tags).ToList()
  • Get all the books with the TagId (which holds the category name)
    var books = context.Books.Tags.Select(t => t.TagId).ToList()

EF Core will detect that your query is using a direct many-to-many relationship and add the extra SQL to use the hidden linking table to get the correct entity instances on the other end of the many-to-many relationship.

Direct many-to-many usage: Add a new link

To add another many-to-many link to an existing entity class is easy – you just add the existing entry into the direct many-to-many navigational collection property. The code below shows how to add an existing Tag to a book that already had one Tag already.

var book = context.Books
    .Include(p => p.Tags)
    .Single(p => p.Title == "Quantum Networking"); 

var existingTag = context.Tags         
    .Single(p => p.TagId == "Editor's Choice");

book.Tags.Add(existingTag);
context.SaveChanges();

When you add the existing Tag into the Tags collection EF Core works out you want a linking entry created between the Book and the Tag. It then creates that new link.

A few things to say about this:

  • You should load the existing Tags using the Include method, otherwise you could lose any existing links to Tags.
  • You MUST load the existing Tag from the database to add to the Tags navigational collection. If you simply created a new Tag, then EF Core will add that new Tag to the database.

ADVANCE NOTES on navigational collection properties

Point 1: Let me explain why I say “You should load the existing Tags…” above. There are two situations:

  • If you add an empty navigational collection on the initialization of the class, then you don’t have add the Include method, as an Add will work (but I don’t recommend this – see below).
  • If your navigational collection is null after construction, then you MUST load the navigational collection, otherwise your code will fail.

Overall, I recommend loading the navigational collection using the Include method even if you have navigational collection has been set to an empty collection because the entity doesn’t match the database, which I try not to do as a future refactor might assume it did match the database.

Point 2: If you are adding a new entry (or removing an existing linking relationship) in a collection with LOTs of items in the collection, then you might have a performance issue with using an Include. In this case you can create (or delete for remove link – see below) the linking table entry. For a direct many-to-many relationship, you would need to create a property bag of the right form to add.

NOTE These ADVANCE NOTES also apply to the adding a new indirect many-to-many link.

Direct many-to-many usage: Remove a link

Removing a link to an entity that is already in the navigation property collection you simply remove that entity instance from the collection. The code below shows removing an existing Tag using the Remove method.

var book = context.Books
    .Include(p => p.Tags)
    .First();

var tagToRemove = book.Tags
    .Single(x => x.TagId == "Editor's Choice");
book.Tags.Remove(tagToRemove);
context.SaveChanges();

This just like the adding of a link, but in this case EF Core works out you what linking entity that needs to be deleted to remove this relationship.

Direct many-to-many usage: Create Book with tags

To add tags when you first create a book you just add each tag to the Tags collection. The code below adds two existing Tags to a new book (note that I haven’t set up the Author – see this part for how you do that).

var existingTag1 = context.Tags.Single(t => t.TagId == "Tag1");
var existingTag2 = context.Tags.Single(t => t.TagId == "Tag2");
var newBook = new Book()
{
    Title = "My Book",
    //... other property settings left out
    
    //Set your Tags property to an empty collection
    Tags = new List<Tag>()
};
newBook.Tags.Add(existingTag1);
newBook.Tags.Add(existingTag2);
context.Add(newBook);
context.SaveChanges();

Indirect many-to-many setup – configuring the linking table

An indirect many-to-many relationship takes a bit more work, but it does allow you to use extra data that you can put into the linking table. The figure below shows the three entity classes, Book, BookAuthor, and Author, with define the many-to-many relationship.

This is more complex because you need to define the linking entity class, BookAuthor, so that you can add extra data in the linking table and also excess that extra data when you query the data.

EF Core will automatically detect the relationships because of all the navigational properties. But the one thing it can’t automatically detect is the composite primary key in the BookAuthor entity class. This code below shows how to do that.

protected override void OnModelCreating(ModelBuilder modelBuilder)
{
    modelBuilder.Entity<BookAuthor>() 
        .HasKey(x => new {x.BookId, x.AuthorId});
} 

NOTE: Like the direct many-to-many configuration, if you leave out any of the four navigational properties, then it won’t set up that part of the many-to-many. You will then have to add Fluent API commands to set up the relationships.

Indirect many-to-many usage – querying

The indirect query is more complex, but that’s because you want to order the Author’s Names.

  • Load all the Books with their BookAuthor and Author entity classes
    var books = context.Books
         .Include(book => book.AuthorsLink).ThenInclude(ba => ba.Authors
         .ToList();
  • Load all the Books with their BookAuthor and Author entity classes, and make sure the Authors are in the right order
    var books = context.Books
         .Include(book => book.AuthorsLink.OrderBy(ba => ba.Order))
         .ThenInclude(ba => ba.Authors
         .ToList();
  • Get all the Books’ Title with the authors names ordered and then returned as a comma delimitated string
    var books = context.Books.Select(book => new
    {
        Title = book.Title,
         AuthorsString = string.Join(", ",  

    book.AuthorsLink.OrderBy(ba => ba.Order)
              .Select(ba => ba.Author.Name))
    }).ToList();

NOTE: ordering within the Include method is also a new feature in EF Core 5.

Indirect many-to-many usage – add a new link

To add a new many-to-many relationship link you need to add a new instance of the linking entity class, in our example that is a BookAuthor entity class, and set up the two relationships, in this example filling in the Book and Author singleton navigational properties. This is shown in the code below, where we set the Order to a value that adds the new Author on the end (the first Author has an Order of 0, the second Author is 1, and so on).

var existingBook = context.Books                           
    .Include(p => p.AuthorsLink)                   
    .Single(p => p.Title == "Quantum Networking"); 

var existingAuthor = context.Authors          
    .Single(p => p.Name == "Martin Fowler");

existingBook.AuthorsLink.Add(new BookAuthor  
{
    Book = existingBook,  
    Author = existingAuthor,  
    // We set the Order to add this new Author on the end
    Order = (byte) book.AuthorsLink.Count
});
context.SaveChanges();

A few things to say about this (the first two are the same as the direct many-to-many add):

  • You should load the Book’s AuthorsLink using the Include method, otherwise you will lose any existing links to Authors.
  • You MUST load the existing Author from the database to add to the BookAuthor linking entity. If you simply created a new Author, then EF Core will add that new Author to the database.
  • Technically you don’t need to set the BookAuthor’s Book navigational property because you added the new BookAuthor instance to the Book’s AuthorsLink, which also tells EF Core that this BookAuthor is linked to the Book. I put it in to make it clear what the Book navigational does.

Indirect many-to-many usage – removing a link

To remove a many-to-many link, you need to remove (delete) the linking entity. In this example I have a book with two Authors, and I remove the link to the last Author – see the code below.

var existingBook = context.Books
    .Include(book => book.AuthorsLink
        .OrderBy(x => x.Order))
    .Single(book => book.BookId == bookId);

var linkToRemove = existingBook.AuthorsLink.Last();
context.Remove(linkToRemove);
context.SaveChanges();

This works, but you have the problem of making sure the Order values are correct. In the example code I deleted the last BookAuthor linking entity so it wasn’t a problem, but if I deleted any BookAuthor other than the last I should recalculate the Order values for all the Authors, otherwise a later update might get the Order of the Authors wrong.

NOTE: You can remove the BookAuthor by removing it from the Book’s AuthorsLink collection, like you did for the direct many-to-many remove. Both approches work.

Indirect many-to-many usage – Create Book with Authors

To add Authors when you first create a book you need to add a BookAuthor linking class for each author in the book, setting the Order property to define the order that the Authors should be displayed in. The code below adds two existing Authors to a new book.

var existingAuthor1 = context.Authors
    .Single(a => a.Name == "Author1");
var existingAuthor2 = context.Authors
    .Single(a => a.Name == "Author2");
var newBook = new Book()
{
    Title = "My Book",
    //... other property settings left out

    //Set your AuthorsLink property to an empty collection
    AuthorsLink = new List<BookAuthor>()
};
newBook.AuthorsLink.Add(new BookAuthor
{
    Book = newBook,
    Author = existingAuthor1,
    Order = 0  //First author
});
newBook.AuthorsLink.Add(new BookAuthor
{
    Book = newBook,
    Author = existingAuthor2,
    Order = 1  //second author
});
context.Add(newBook);
context.SaveChanges();

Conclusion

So, since EF Core 5, you have two ways to set up a many-to-many – the original indirect approach (Book-BookAuthor-Author) and the new direct (Book-Tags) approach.  The new direct many-to-many is really easy to use, but as you have seen sometimes using the original indirect approach is the way to go when you want to do more than a simple link between to entity classes.

If you didn’t find this link before, I really recommend an excellent video produced by the EF Core team which has a long section on the new, direct many-to-many, including how to configure it to include extra data.

All the best with your EF Core coding and do have a look at my GitHub page to see the various libraries I have created to help build and test EF Core applications.

Happy coding.