Entity Framework Core: Client vs. Server evaluation

Entity Framework Core (EF Core) is a new, ground-up rewrite of the highly successful Entity Framework 6 (EF6.x) database access library for the .NET platform.  I have been commissioned by Manning Publishing to write a book on EF Core called Entity Framework Core in Action. In writing this book I have looked at any and all aspects of EF Core, and I am really impressed with the topic I am going to cover in this article, that is, Client vs. Server evaluation.

Client vs. Server evaluation is a feature that makes it easier to write queries to the database, because it allows you to include methods in your query that cannot be run on the database, that is, methods that EF Core cannot convert to SQL commands.

In my Entity Framework Core in Action book I am writing I use a database containing various technical books as an example (seemed appropriate) – you can see a live site at http://efcoreinaction.com/. The diagram below shows you just the the Book table with a many-to-many link to each Book’s Author(s).

Now I want to list a book, with its primary key, title and the Author(s) as a comma delimited string. The code below will do this, in this case I take the first book in the database:

var book = context.Books
    .Select(p => new
    {
        p.BookId,
        p.Title,
        AuthorsString = string.Join(", ",
            p.AuthorsLink
            .OrderBy(q => q.Order)
            .Select(q => q.Author.Name)),
    }
    ).First();

A typical output would be

BookId Title AuthorsString
1 Beginning Scala Vishal Layka, David Pollak

What happens is that EF Core converts the query into SQL to run on the database, but one method, string.Join, isn’t supported as a SQL command by EF Core’s SQL database provider, and at this point Client vs. Server evaluation comes in.

What happens is that EF Core can sort the commands into two groups:

  1. Commands that can be converted into SQL commands and run on the database server
  2. Commands that cannot be converted into SQL, so therefore must be run in software in the application, known as the client.

The figure below gives you a better view of this process. This shows the effect of running the EF Core query I just listed on a book that had two authors, Jack and Jill. As you can see Authors property will be set to “Jack, Jill”, while the BookId and Title would be set to the value of the corresponding columns in the Books Table.

Stage 1, the translation of the LINQ query, is where (with the help of the database provider) EF Core splits up the LINQ commands and other methods into either “run on database” (Server) or “run in software” (Client).

Stage 2 is where the SQL commands are run on the database server and the values are read back. The SQL that EF Core creates is given below.

SELECT TOP(1) [p].[BookId], [p].[Title]
FROM [Books] AS [p]

SELECT [q.Author].[Name]
FROM [BookAuthor] AS [q]
INNER JOIN [Authors] AS [q.Author] ON 
    [q].[AuthorId] = [q.Author].[AuthorId]
WHERE @_outer_BookId = [q].[BookId]
ORDER BY [q].[Order]

Note that EF Core always reads collections using a separate SQL command (EF6.x would have output one commands to read it all, but that can be inefficient in cases where the collection is large).

Stage 3 is run because EF Core detected commands that had to be run on the Client. It gets the data it needs from the SQL data and runs the software-only methods on the Client.

In stage 4 you see that EF Core combines the Server data and the Client data and provides the composite data back to the caller.

The Client vs. Server evaluation feature gives you as a developer the ability to create complex queries, and EF Core will optimize the query to run as much it can on the database Server. But if there is some method in your query that cannot be run on the database server then the query won’t fail, but EF Core will apply that method after SQL Server has done its part.

The example I showed is simple, but the possibilities are endless. But there are a few things to watch out for.

What about Entity Framework 6.x (EF6.x)?

The Client vs. Server evaluation feature is new in EF Core, and is not available in EF6.x. This means that I have to add extra stages in my query if I want to use methods such as string.Join. This makes it harder work in those cases. Obviously in EF Core this gets easier.

If you are still on EF6.x then I can recommend Alexander Zaytsev’s DelegateDecompiler (DD for short) which provides a similar feature (see this article, where I use the DD). While I have used DD, it does require me to add a command at the end of every query. Also DD is not (currently) available on .NET Core.

Understanding the limitations of Client vs. Server evaluation

I think the Client vs. Server evaluation feature is a very useful addition to EF. But, like all powerful features, it is best to understand what is going on so that you can use it in the right way.

Firstly, the obvious thing is the method you provide is run on every entity (row) you read from the database. If you have 10,000 rows in the database and don’t filter/limit what is loaded then, as well as a SQL command that takes a long time, your processor will spend a long time running your method 10,000 times.

The second point is subtler: The Client vs. Server evaluation feature blurs the lines between what is run in the database and what is run in the client. It would be possible to create a query that worked, but was slower than it could be because it had to use client-side evaluation. To give you some context, in EF6.x this form of mixed client/server query would have failed because it did not support that. That meant in EF6.x you had to do something about it – often by changing the query to better suit the database. Now your query may work, but could perform worse than one you write such that EF Core can convert it directly to SQL commands.

One extreme example of the problem is that Client vs. Server evaluation allows you to sort on a client-side evaluated property, which means the sorting is done in the client rather than in the database server. I tried this by replacing the .First() command with .Sort(p => p. AuthorsString) in listing 2.9 and returning a list of books. In that case EF Core produces SQL code that reads all the books, then read each row individually, twice, which is definitely not optimal.

Tips on performance tuning Client vs. Server evaluation

EF Core’s Client vs. Server evaluation feature has some very useful features you can use to make sure you aren’t producing poor-performing.

Firstly, EF Core will log a warning the first time a Client vs. Server evaluation will adversely affect performance, and this logging is intelligent. In the case of my book query that used string.Join then no warning is given because the string.Join does not affect the performance. However if I added the .Sort(p => p. AuthorsString) command then I get a warning.

Secondly, if you want to stop any poorly performing Client vs. Server evaluation queries from happening then you can configure EF Core to throw an exception if a poorly performing query is found.  See code below:

protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
{
    optionsBuilder .UseSqlServer(
       @"Server=(localdb)\mssqllocaldb;Database=MyDatabase;Trusted_Connection=True;")
       .ConfigureWarnings(warnings
            warnings.Throw(RelationalEventId.QueryClientEvaluationWarning));
}

Note: The Microsoft documentation for Client vs. Server evaluation can be found at https://docs.microsoft.com/en-us/ef/core/querying/client-eval

Conclusion

EF Core has a load of nice features and, at the moment, misses some EF6.x features that that I rather liked. But I think Client vs. Server evaluation is one of those great features that will help me to build queries quickly, and often at no reduction of performance. For that I am very grateful.

Happy coding

Six ways to build better Entity Framework (Core and EF6) applications

In this article I describe six different approaches to building applications that use Microsoft’s database access framework, Entity Framework (EF). All six approaches are based on software principles and patterns that many of you will be familiar with. This article shows how I applied these principles and patterns in real-world applications that use EF. The six principles and patterns are:

  1. Separation of concerns – building the right architecture.
  2. The Service Layer – separating data actions from presentation action.
  3. Repositories – picking the right sort of database access pattern.
  4. Dependency injection – turning your database code into services.
  5. Building business logic – using Domain-Driven Design with EF.
  6. Performance tuning EF – get it working, but be ready to make it faster if you need to.

I have an interesting perspective when it comes to software principles and patterns. I started out as a programmer, but then moved into technical management 1988. I came back to programming in 2009, a 21-year gap, and the whole software world had changed. Thankfully over those years some seriously clever people had been thinking about the science of programming, and I avidly read their articles and books.

While it was great to learn things from these software giants, I found it took some time to find the right ways to apply their ideas in the applications I developed. This article brings together the combination of ideas from some of the great software thinkers with my years of learning as I have applied and refined their approaches in building applications that used EF for database access.

All the software and figures in this article come from a book I am writing for Manning Publication called Entity Framework Core in Action that is now on early-access release i.e. if you buy now you will get the chapters as I write them, and the full book when it’s finished. The code is therefore based on the new .NET Core frameworks: EF Core and ASP.NET Core. However, these software principles and patterns apply equally well to the older Entity Framework, version 6 (EF6.x) framework and ASP.NET MVC5.

Note: I’m going to assume you know about Entity Framework already. If you don’t then I recommend you read the first chapter of my book, which is free, or look at Microsoft’s EF Core documentation which includes a number of example applications.

Before I start I need to say that the descriptions I show are my way of implementing the principles and patterns I talk about. There are plenty of other ways to implement each of six topics, mine are just one of them. Have fun yourself developing your own techniques and libraries that will improve and accelerate your development.

1. Principle: Separation of concerns – building on the right architecture

There is a software principal called Separation of Concerns (SoC), which says that you should:

  • Put code that has a similar, or strongly-related functions should be grouped together – in .NET terms put in separate projects. This called cohesion.
  • Make each group/project as self-contained as possible. Each piece of code should have a clear interface and scope of work that is unlikely to change to because of other callers changing what they do. This is called low coupling.

NOTE: Here is a more in-depth article on Separation of Concerns if you want to look into SoC some more.

For simplicity most examples of EF code on the web tend to show EF Core database commands being called directly from whatever application type they are using. This doesn’t follow SoC and nor does it really represent how real applications are written. I took the decision in my book to use a more representative software architecture for my example code, and I used a layered architecture. Using this does make it a little more difficult for the reader, but I build up the application structure over a number of chapter. Here is the final architecture of my example book selling site.

I could have used a number of different architectures, see this link for a good list, but the layered approach works well for small to medium applications. A layered architecture is also a good fit for cloud hosting, where cloud providers can spin up more instances of the web application if it is under a heavy load, that is, it will run multiple copies of a web application and place a load balancer to spread the load over all the copies. This known as scale out on Microsoft’s Azure and auto scaling on Amazon’s AWS.

The figure below shows how I apply SoC to my database access code. It shows the same software architecture, but with all my EF database access code highlighted in bubbles. The size of the bubbles relates to the amount of EF code you will find in each layer. Notice that the ASP.NET Core project and the pure business logic (BizLogic) project have no EF Core query/update code in them at all.

As I go through this article I will refer back to SoC, as it is an integral part of how I approach database applications.

2. Pattern: The Service Layer – separating data actions from presentation actions

One of the influential books I read when I came back to programming was Dino Esposito and Andrea Saltarello’s book Microsoft .NET: Architecting Applications for the Enterprise, published in 2009. Some of the technology he covered then is now superseded, but the patterns are still relevant today (Dino has written a number of newer books, see this link). This book introduced me to the use of Service Layer, which Martin Fowler previously described in his classic book in 2002.

Dino Esposito says in his book that the Service Layer “sets a boundary between two interfacing layers” and Martin Fowler’s site link says the Service Layer “Defines an application’s boundary with a layer of services that establishes a set of available operations and coordinates the application’s response in each operation”. That all sounds great, but how does that help my applications? Quite a bit actually.

I’m going to describe how the Service Layer acts as an adapter in this section. Later, in the section of business logic, I will cover the second way the Service Layer can serve me, by being in command of running my business code.

The Service Layer as an adapter

In a layered architecture there is often a data mismatch between the database/business logic and the presentation layer. The Domain-Driven Design (DDD) approach, which I describe later, says that the database and the business logic should be focused on the business rules, while the Presentation Layer is about giving the user a great user experience, or in the case of a web service, providing a standard and simple API.

For this reason, the Service Layer becomes a crucial layer, as it can be the layer that understands both sides and can transform the data between the two worlds. This keeps the business logic and the database uncluttered by the presentation needs, like drop down list and json AJAX calls. Similarly, by having the Service Layer deliver pre-formatted data in exactly the form the presentation layer needs then it makes it much simpler for the presentation layer to show that data.

When dealing with database accesses via EF the Service Layers uses an adapter pattern to transform from the data layer/business logic layers to/from the presentation layer. Databases tend to minimise duplication of data and maximises the relational links between data, while the presentation layer is about showing the data in a form that the user finds useful.

The figure below shows an example of this difference in approach. The image comes from the list of books produced by the example book selling site that I create in my book. You can see how I have to pick data from lots of different tables in the database, and do some calculations, to form a summary of a book in my book list display.

Note: You can see this book list in action on the live site that hosts the example book selling site, at http://efcoreinaction.com/

EF provides a way of building queries, called select loading, that can ‘pick out’ the relevant columns from each table and combine them into a DTO/ViewModel class that exactly fits the user view. I apply this transform in the Service Layer, alone with other sorting, filtering and paging features. The listing below is the select query using EF Core to build the book summary you just saw in the figure above.

public static IQueryable<BookListDto> 
    MapBookToDto(this IQueryable<Book> books)   
{
    return books.Select(p => new BookListDto
    {
        BookId = p.BookId,                      
        Title = p.Title,                        
        Price = p.Price,                        
        PublishedOn = p.PublishedOn,            
        ActualPrice = p.Promotion == null       
                ? p.Price : p.Promotion.NewPrice,         
        PromotionPromotionalText =              
                p.Promotion == null             
                  ? null : p.Promotion.PromotionalText,
        AuthorsOrdered = string.Join(", ",      
                p.AuthorsLink                   
                .OrderBy(q => q.Order)          
                .Select(q => q.Author.Name)),   
        ReviewsCount = p.Reviews.Count,         
        ReviewsAverageVotes =                   
                p.Reviews.Count == 0            
                ? null                          
                : (decimal?)p.Reviews           
                    .Select(q => q.NumStars).Average()
    });
} 

Note: The code above wouldn’t work with EF6.x because it includes the command string.Join that cannot be converted into SQL but EF6.x. EF Core has a called Client vs. Server Evaluation, which allows methods that cannot be translated to SQL to be included. They are run after the data has been returned from the database.

Yes, this code is complex, but to build the summary we need to pull data from lots of different places and do some calculations at the same time, so that’s what you get. I have built a library called GenericServices (currently only available for EF6.x) with automates the building of EF select loading commands like this by using a LINQ Mapper called AutoMapper. This significantly improves the speed of development of these complex queries.

3. Pattern: Picking the right sort of database access pattern

There are a number of different ways we can form your EF database access inside an application, with different levels of hiding the EF access code from the rest of the application. In the figure below I show four different data access patterns.

The four types of database access patterns are:

  1. Repository + Unit of Work (Repo+UOW). This hides all the EF Core behind code that provides a different interface to EF. The idea being you could replace EF with another database access framework with no change to the methods that call the Repo+UOW.
  2. EF repository. This is a repository patterns that doesn’t try and hide the EF code like the Repo+UOW pattern does. EF repositories assume that you as developer know the rules of EF, such as using tracked entities and calling SaveChanges for updates, and you will abide by them.
  3. Query Object. Query objects encapsulate the code for a database query, that is a database read. They hold the whole code for a query or for complex queries it might hold part of a query. Query objects are normally built as extension methods with IQueryable<T> inputs and outputs so that they can be chained together to build more complex queries.
  4. Direct calls to EF. This represents the case where you simply place the EF code you need in the method that needs it. For instance, all the EF code to build a list of books would be in the ASP.NET action method that shows that list.

NOTE: AS I said earlier I have created a library called GenericServices for EF6.x (and EF Core in the future). This is a form of EF repository.

I used the Repo+UOW pattern, which was the recommended approach at the time, in a big project in 2014 – and I found it was really hard work. I and many others realised Repo+UOW wasn’t the way to go – see my article ‘Is the Repository pattern useful with Entity Framework?’. The Repo+UOW can be a valid pattern in some cases where hiding of the certain part of the data is needed, but I think there are better ways to do this with some of the new EF Core features, such as backing fields.

At the other end of the spectrum is the direct calls to EF in the method that needs it. This fails the separation of concerns principal because the database code is mixed in with other code not directly involved in database issues.

So, having ruled out the two extremes I would recommend:

  • Query Objects for building queries, often breaking down large queries into a series of query objects. The previous listing in this article of the method called MapBookToDto, which is a query object. I cover query objects in chapter 2 of my book.
  • For Create, Update and Delete (and business logic which I cover later) I use a EF repository pattern, that is, I create a method that encapsulates the EF database access. This isolates the EF code and makes it easier to refactor or performance tune that code.

The listing below shows a class with two EF repository methods for changing the publication date of a book in my example book selling site. I cover this in chapter 3 of my book.

public class ChangePubDateService : IChangePubDateService
{
    private readonly EfCoreContext _context;

    public ChangePubDateService(EfCoreContext context)
    {
        _context = context;
    }

    public ChangePubDateDto GetOriginal(int id)    
    {
        return _context.Books
            .Select(p => new ChangePubDateDto      
            {                                      
                BookId = p.BookId,                 
                Title = p.Title,                   
                PublishedOn = p.PublishedOn        
            })                                     
            .Single(k => k.BookId == id);          
    }

    public Book UpdateBook(ChangePubDateDto dto)   
    {
        var book = _context.Books.Find(dto.BookId);
        book.PublishedOn = dto.PublishedOn;        
        _context.SaveChanges();                    
        return book;                               
    }
}

4. Pattern: Turning your database code into services

I have used dependency injection (DI) for years and I think it’s really useful approach. I want to show you a way you can inject your database access code into an ASP.NET Core application.

Note: If you haven’t used DI before have a look at this article for an introduction, or this longer article from another of the great thinker, Martin Fowler.

The benefits of doing this are twofold. Firstly, DI will dynamically link together your database access to into the parts of the presentation/web API code that need it. Secondly, because I am using interfaces, it is very easy to replace the calls to the database access code with mocks for unit testing.

I haven’t got the space in this article to give you all the details (I takes five pages in chapter 5 of my book to cover this), but here are the main steps, with links to online documentation if you want to follow it up. Here are the steps:

  1. You need to make each of your database access code into thin repositories. That is a class containing a method, or methods, that the front-end code needs to call. See the ChangePubDateService class listed above.
  2. You need to add an interface to each EF repository class. You can see the IChangePubDateService interface applied to the ChangePubDateService class listed above.
  3. You need to register your EF repository class against its interface in the DI provider. This will depend on your application. For ASP.NET Core see this article.
  4. Then you need to inject it into the front-end method that needs it. In ASP.NET Core you can inject into an action method using the [FromServices] Note: I use this DI parameter injection rather than the more normal constructor injection because it means I only create the class when I really need it, i.e. it is more efficient this way.

Note: I realise that is a lot to take in. If you need more information can look at the GitHub repo associated with my book. Here are some useful links:

At the end of this you have a method you can call to access the database. The listing below shows an ASP.NET Core action method that calls the UpdateBook method of the ChangePubDateService class that I listed previously. Line 4 has the [FromServices] attribute that tells the DI provider to inject the ChangePubDateService class into the parameter called service.


[HttpPost]
[ValidateAntiForgeryToken]
public IActionResult ChangePubDate(ChangePubDateDto dto,
   [FromServices]IChangePubDateService service)
   {
      service.UpdateBook(dto);
      return View("BookUpdated",
         "Successfully changed publication date");
}

NOTE: There is a way to do parameter injecting into an ASP.NET MVC action method, but it involves you having to override the default Binder. See The section “How DI is used in SampleMvcWebApp” at the bottom of this page, and my DiModelBinder in the associated GitHub repo.

The result of all this is that database code is nicely separated into its own class/method and your front-end code just has to call the method, not knowing what it contains. And unit testing is easy, as you can check the database access code on its own, and replace the same code in your front-end call with a mocking class that implements the interface.

5. Pattern: Building business logic – using Domain-Driven Design

Real-world applications are built to supply some sort of service, ranging from holding a simple list of things on your computer through to managing a nuclear reactor. Every different real-world problem has a set of rules, often referred to as business rules, or by the more generic name, domain rules.

Another book that I read some time ago that had a big impact on me was “Domain-Driven Design” by Eric Evans. The Domain-Driven Design (DDD) approach says that the business problem you are trying to solve must drive the whole of the development. Eric then goes on to explain how the business logic should be isolated from everything else other that the database classes so that you can give all your attention to what Eric Evans calls the “difficult task” of writing business logic.

There are lots of debates about whether EF Core is suitable for a DDD approach, because the business logic code is normally separate from the EF entity classes which it maps to the database. However, Eric Evans is pragmatic on this point and says in the section entitled “Working within your (database access) Frameworks” that, and I quote:

“In general, don’t fight your framework. Seek ways to keep the fundamentals of domain-driven design and let go of the specifics when your framework is antagonistic”
Page 157, Domain-Driven Design, by Eric Evans, 2004.

Note: I had to look up the word antagonistic: it means “showing or feeling active opposition or hostility towards someone or something”.

Over the years I have developed a DDD approach that works with EF and I have dedicated the whole of chapter 4 of my book to the topic of business logic because it is so important to get right. Here is a summary of the guidelines in that chapter:

  1. The business logic has first call on how the database structure is defined

Because the problem I am trying to solve, called the “Domain Model” by Eric Evans, is the heart of the problem then it should define the way the whole application is designed. Therefore, I try to make the database structure, and the entity classes, match my business logic data needs as much as I can.

  1. The business logic should have no distractions

Writing the business logic is difficult enough in itself, so I isolate it from all the other application layers, other than the entity classes. That means when I write the business logic I only have to think about the business problem I am trying to fix. I leave the task of adapting the data for presentation to the Service Layer in my application.

  1. Business logic should think it is working on in-memory data

This is something Eric Evans taught me – write your business logic as if the data was in-memory. Of course there needs to be some a ‘load’ and ‘save’ parts, but for the core of my business logic I treat, as much as is practical, the data as if it is a normal, in-memory class or collection.

  1. Isolate the database access code into a separate project

This fairly new rule came out of writing an e-commerce application with some complex pricing and deliver rules. Before this I used EF directly in my business logic, but I found that it was hard to maintain, and difficult to performance tune. Now I have another project, which is a companion to the business logic, and holds all the database access code.

  1. The business logic should not call EF Core’s SaveChanges directly

The business logic does not call EF Core’s SaveChanges method directly. I have a class in the Service Layer whose job it is to run the business logic – this is a case of the Service Layer implementing the command pattern. and, if there are no errors, it calls SaveChanges. The main reason is to have control of whether to write the data out, but there are other benefits that I describe in the book.

The figure below shows the original software architecture, but with the focus on how the business logic is handled. The five numbers, with comments, match the numbered guidelines above.

In my book I use the processing of an order for books as an example of a piece of business logic. You can see this business logic in action by going to the companion live site, http://efcoreinaction.com/, where you can ‘buy’ a book. The site uses an HTTP cookie to hold your basket and your identity (saves you having to log in). No money needed – as the terms and conditions says, you aren’t actually going to buy a book.

The code is too long to add to this article, but I have written another article called Architecture of Business Layer working with Entity Framework (Core and v6) which covers the same area in more detail and contains plenty of code examples.

6. Principle: Get your EF code working, but be ready make it faster if you need to.

The recommended approach to developing software is to get it to work, and then worry about making it faster. Another more nuanced approach, attributed to Kent Beck, is Make it Work. Make it Right. Make it Fast. Either way, these principle says we should leave performance tuning to the end. I would add a second part: you should only performance tune if you need to.

In this article I am talking about database accesses via EF. I can develop pretty complex database accesses in EF really quickly – at least five times faster than using ADO.NET or Dapper. That covers the “get it working part”. The down side is that EF doesn’t always produce the best performing SQL commands: sometimes it’s because EF didn’t come up with a good SQL translation, and sometimes it’s because the LINQ code I wrote isn’t as efficient as I thought it was. The question is: does it matter?

For example, I developed a small e-commerce site (the code took me 10 months) which had a little over a 100 difference database accesses and about 20 tables. More than 60% of the database accessed were on the admin side, with maybe 10% of accesses that really mattered to the paying user.

To show this graphically I have picked out three features from my example book selling site and then graded them by two scales:

  • vertically, what the user expects in terms of performance.
  • Horizontally, how difficult is the database access.

This gives you the figure below, with top right highlighted as area where we really need to think about performance tuning.

My analysis says that only the book search query needs work to improve it. The user is used to fast searches thanks to Google etc. and will get frustrated if my application is too slow. Looking at the complexity of the book search, which includes filtering on things like average user votes, I can see that it produces some rather complex database access commands.

It turns out that the EF Core code for my book search performs badly, but there is plenty I can do about it. In fact, I have mapped out a whole section towards the end of my book where I show how I can improve the book search in a series stages, each getting more complex and taking more development time. They are:

  1. Can I improve the basic EF commands by rearranging or refining my EF code?
  2. Can I convert some or all of the EF code into direct SQL commands, calculated columns, store procedures etc.?
  3. Can I change the database structure, such as de-normalising the database, to improve the search performance?

Improving the book search will take quite a bit of time, but it’s worth it in this application. Quite honestly the other features aren’t worth the effort, as they are fast enough using standard EF Core commands.

Planning for possible performance tuning

While I totally agree with the idea that you shouldn’t try to performance tune too early, it is sensible to plan that you might have to performance tune. All the other approaches I have described, especially the encapsulation of the databases accesses, means that my database code is a) clearly isolated, and b) open for performance tuning.

So, my practice is to develop database accesses quickly with EF, but organise the code so that I encapsulate the database code cleanly. Then, if I have a performance problem then I can tune the database access code with minimal impact on other parts of your application.

Conclusion

When I came back to programming in 2009 some of my early code ended up with long methods which were intertwined with each other – they worked, but the code was hard to understand, debug and maintain. Since then I have taken the principles and practices from the software giants and worked out how to apply them real applications. With these I can write code that is robust, easy to test and easy to improve – and I write code much faster too.

However, for me, there is a gap between seeing a great idea and working out how it could help me. Some books I instantly loved, like Eric Evans “Domain-Driven Design” but it has still taken two or three projects before I had an implementation that works really well for me. I don’t think I’m unusual in this, as learning any new technique takes some time for it to become smooth and natural.

There are many great software thinkers out there and some great principles and practices. So the next time you think “there must be a better way of doing this” then go and have a look for other people’s thoughts. It’s a journey, but the reward is becoming a better developer.

Happy coding!

More information on my book, Entity Framework Core in Action

Quite a bit is going on around the book I am writing, called Entity Framework Core in Action, now that it is on early release. In this short article I want to point you to some resource you might like to look at, and also give you a bit more on the planned content of the book.

Resources available for the book

The following resources are available:

  • A slideshare presentation giving you an overview of what’s in the book.
    IMPORTANT: discount code in the slideshare presentation! See first slide.
  • The book page, with links to the first chapter, which is free.
  • The GitHub repo with all the code that goes with the book.
    NOTE: I use a branch-per-chapter, so you need to select a branch to see the code.
  • The live site, at http://efcoreinaction.com/ where you can see the example book selling site that I build in chapters 1 to 5, and improve in later chapters.

Note: You can buy an early-access version of my book. You will get the chapters as I write them, and the full book when the book is released. This means you have early access to the information and you get the final book with any updates I make to the chapters before it is releases.

Table of Content for the book

Here is the current table of contents for my book. It is quite possible that a couple of chapter in part 2 and 3 might get too big and be split into two.

Part 1:  Introduction to Entity Framework Core

By the end of part 1 you should be able to build a .NET application that uses a relational database, but how the database is organized is left to EF Core; for instance, EF Core’s default configuration will set the type and size of the database columns, which can be a bit wasteful on space. I also look at building and deploying a web application that uses EF Core to handle its database.

The chapters are:

  1. Introduction to Entity Framework Core
  2. Querying the database
  3. Updating the database: Create, Update & Delete
  4. Using EF Core in Business Logic
  5. Using EF Core in ASP.NET Core web applications

Part 2:  Entity Framework Core In Depth

Part 2 is more of a EF Core reference section. By the end of part 2 you will be able to create a database with the exact type and size of each column, or link to an existing database which has a specific schema. I go deeper into the commands in EF Core’s DbContext and look at different ways of migrating a database.

The chapters are:

  1. Modelling options for scalar properties
  2. Modelling options for relationships
  3. Going deeper into the DbContext
  4. Handling database migrations

Part 3: Using Entity Framework in real-world applications

Part 3 is all about improving your skills and making you a better developer, and debugger, of EF Core applications. I look at real-world applications of EF Core starting with a range of known patterns and practices that you can use with EF Core with chapters on Unit Testing EF Core applications, extending ER Core, and most importantly how to find and fix EF Core performance issues.

The chapters are:

  1. Patterns for using EF Core
  2. Unit Testing EF Core applications
  3. EF Core performance Tuning
  4. Extending EF Core

Appendices

A brief introduction to LINQ

Conclusion

A pretty short and sweet post, but I hope it helpful. My next post will be much bigger.

Happy coding!

 

Updating many to many relationships in Entity Framework Core

I wrote an article called Updating many to many relationships in entity framework back on 2014 which is still proving to be popular in 2017. To celebrate the release of my book Entity Framework Core in Action I am producing an updated version of that article, but for Entity Framework Core (EF Core).

All the information and the code comes from Chapter 2 of my book. In the book I use a book selling site, with books linked to authors via a many-to-many relationship. You can see the live example site and if you scroll down you will see books with multiple authors.

All the unit tests and classes I use in this article can be found in the Chapter03 branch of the Git repo associated with this book.

NOTE: If you are in a hurry I have added summaries for each section – just search for the word Summary to get the answers without needing to look at the code examples.

Creating a many-to-many relationship in EF Core

The example I am going to use is one taken from the book I am writing. It has a Book entity linked its Author(s) entities via a BookAuthor table. The database diagram below shows the three tables, with the foreign key links from the BookAuthor linking table to the other two tables.

A linking table like this has foreign keys (FK in the diagram) which link to the primary key(s) (PK in the disgarm) of each end of the relationship – in this case the BookId of the Book entity and the AuthorId of the Author entity. These foreign keys then form a composite primary key for the BookAuthor table.

EF6.x created this table for you when you defined a many-to-many relationship, but EF Core, which takes a leaner approach, doesn’t – you need to do it. Its not hard, so let me give you the code to define the BookAuthor entity class.

public class BookAuthor               
{
    public int BookId { get; set; }   
    public int AuthorId { get; set; } 
    public byte Order { get; set; }   

    //-----------------------------
    //Relationships

    public Book Book { get; set; }    
    public Author Author { get; set; }
}

NOTE: I have a property called Order because the order of the authors matter. If the order of items in your many-to-many list don’t matter then you can leave that out.

EF Core will find the relationships because I have used names that it understands:

  • I used the same names, BookId and AuthorId, of the primary keys in the Book class and the Author class.
  • Because I used the classes Book and Author, which EF Core knows are part of the database, then it knows its a relationship.

NOTE: The easiest approach to setting up relationships is by using EF Core’s conventions, which is what I do. But if you want to explicitly define the relationships you can. There are loads of rules, which I cover in chapter 7 of my book, or you can look at the EF Core docs.

Now, the one thing that EF Core can’t work out is what the primary key of the BookAuthor table because they don’t follow the normal convension. There are a number of ways to define the primary key(s) of an entity, but I have used EF Core’s Fluent API that I access via the OnModelCreating method inside my DbContext, as shown below:

public class EfCoreContext : DbContext
{
    public DbSet<Book> Books { get; set; }            
    public DbSet<Author> Authors { get; set; } 
    public DbSet<PriceOffer> PriceOffers { get; set; }       

    public EfCoreContext(                             
        DbContextOptions<EfCoreContext> options)      
        : base(options) {}

    protected override void
        OnModelCreating(ModelBuilder modelBuilder)    
    {                                                 
        modelBuilder.Entity<BookAuthor>()             
            .HasKey(x => new {x.BookId, x.AuthorId});
    }                                                 
}

You can see the .HasKey method towards the bottom of the code where I define the composite key consisting of the BookId and the AuthorId.

NOTE: Again, there are other ways of doing that with the [Key] and [Order] attributes which I cover in chapter 7 of my book, but I find the Fluent API a pretty good way of doing that.

Summary – how to add a many-to-many relationship

To add a many-to-many relationship in EF Core you need to:

  1. Create a linking entity class that has the foreign keys(s) of the two entities you want to form a many-to-many link between. In my example these are BookId and AuthorId.
  2. It is efficient (but not necessary) to make these two foreign keys into the composite primary key. This will ensure that there is only ever one link between the two entities you want. It is also more efficient in terms of size of the database table.
    1. If you do want the two foreign keys to be the composite primary key you need to configure this in EF Core, either via Fluent API or by [Key]/[Order] attributes, because it cannot find them using its normal “by convention” method.
  3. The linking class can, but don’t have to, have relationship links to the two entities it links. In my example my AuthorLinks class has a property Book of type Book, and a property Author of type Author.
  4. You most likely want a collection relationship to the linking table in one or both of the linked entities. In my example I have property called AuthorLinks with the type ICollection<BookAuthor> in my Book class, and a property called BooksLink of type ICollection<BookAuthor> in my Author class.

Adding a Book many-to-many Author link

Creating new many-to-many link is fairly easy in EF Core (but not as easy as EF6.x). The code below creates a new book with a new author.

Note: the variable context used at the bottom of the code below is an instance of the application’s DbContext, called EfCoreContext in this case (see its definition above).

var book = new Book
{
  Title = "Quantum Networking",
  Description = "faster-than-light data communications",
  PublishedOn = new DateTime(2057, 1, 1),
  Price = 220
};
var author = new Author { Name = "Future Person" };
book.AuthorsLink = new List<BookAuthor>
{
  new BookAuthor { 
    Author = author, 
    Book = book,
    Order = 0
  }
};

//Now add this book, with all its relationships, to the database
context.Books.Add( book);
context.SaveChanges();

Updating a many-to-many relationship

It turns out there are two scenarios under which you want to update a relationship. They are:

  1. Connected state: This is where the load of the data and the update is done in one go, that is within one instance of the application’s DbContext. You find this sort of update that happens in a console application, or inside your business logic (see this link on business logic and EF Core)
  2. Disconnected state: This is where the update is split into two halves: a) select the entities you want to change, b) and make the change. Each stage has a different instance of the application’s DbContext. This happens on web sites, where in the first stage the user picks what they want done and posts it back. The second stage then has to re-read the data and update it.

I am going to describe these two approaches separately, so you can go to the one that fits the application you are building.

1. Connected state update

In the connected state we read in the Book entity and update it immediately, i.e. within the same instance of the DbContext. For my Book/Author example I am going to read in the Book, with its linking entities, BookAuthor, as tracked entities. – that means that EF Core will take a copy of the entities we read in so that it can spot any changes when you call SaveChanges.

The listing below is one of my unit tests which adds the existing author “Martin Fowler” to the book called “Quantum Networking”, which currently has one author called “Future Person”. After the test has finished the book “Quantum Networking” has two authors, “Future Person” and “Martin Fowler”

NOTE: I am using a Sqlite, in-memory database, which I seed with four books with known titles and authors.

public void TestAddExtraAuthorOk()
{
    //SETUP
    var inMemDb = new SqliteInMemory();

    using (var context = inMemDb.GetContextWithSetup())
    {
        context.SeedDatabaseFourBooks();

        //ATTEMPT
        var book = context.Books                          
            .Include(p => p.AuthorsLink)                  
            .Single(p => p.Title == "Quantum Networking");

        var newAuthor = context.Authors                   
            .Single(p => p.Name == "Martin Fowler");

        book.AuthorsLink.Add(new BookAuthor
        {
            Book = book,
            Author = newAuthor,
            Order = (byte) book.AuthorsLink.Count
        });
        context.SaveChanges();

        //VERIFY
        var bookAgain = context.Books
            .Include(p => p.AuthorsLink)
            .Single(p => p.BookId == book.BookId);
        bookAgain.AuthorsLink.ShouldNotBeNull();
        bookAgain.AuthorsLink.Count.ShouldEqual(2);
        var authorsInOrder = bookAgain.AuthorsLink.OrderBy(p => p.Order).ToList();
        authorsInOrder.First().Author.Name.ShouldEqual("Future Person");
        authorsInOrder.Last().Author.Name.ShouldEqual("Martin Fowler");
    }
}

The most important parts of the code are lines 11 to 13. Here I load the Book entity using an Include method to load the AuthorLinks at the same time, as tracked entities. The AuthorLinks property holds a a collection of BookAuthor entities linking to the Author entities.

Note: The default way of loading data in EF Core is as tracked entities, that is, EF Core takes a copy of the loaded data so that it can detect if any changes have happened since they were loaded. You can turn off tracking by including the method .AsNoTracking to your query (useful in read-only queries, as it makes them a little faster).

In lines 15 to 16 I also load, again as a tracked entity, the Author entity who’s name is “Martin Fowler” as that the author I want to add to this book.

Then on lines 18 to 23 I create a new BookAuthor linking entry linking the Book entity I loaded to the Author entity of “Martin Fowler”.

When SaveChanges is called on line 24 it finds that the AuthorLinks property  of the book instance it loaded has changed. Because this is a relationship it looks at the entities and finds that there is a new BookAuthor entity, which it adds to the database, and a author which is already in the database so it doesn’t have to add it.

Summary – Connected state many-to-many update

To update a many-to-many relationship in the connected state:

  1. Load an entity at one end of the relationship (doesn’t matter which end) with the collection of links in the many-to-many linking table. In my example I loaded the Book entity I wanted to change along with its AuthorLinks property, which holds the collection of the linking entities in the BookAuthor table.
  2. Alter the linking table collection, either by adding or removing entries from the collection. In my example I added a new BookAuthor class to the AuthorLinks property collection.
    1. Note: if you want to replace the whole collection then you can simply assign a new list to the linking collection (there is an example of this in my unit tests – see test named TestChangeAuthorsNewListOk).

2. Disconnected state

The disconnected state happens when the initial read of the data and update of the data are done separately, i.e. they use different instances of the applications DbContext. This happens in a web application which has two stages:

  1. The first stage is where the user is presented with the book, its current author(s) and a list of possible authors to choose from. Once they have chosen the author to add then press a button which takes their choice back to the web application
  2. In the second stage the web application needs to re-read the data and execute the database update.

The unit test below does the same as the previous example, that is, it adds the existing author “Martin Fowler” to the book called “Quantum Networking”, which starts out with one author called “Future Person”. After the test has finished the book “Quantum Networking” has two authors, “Future Person” and “Martin Fowler”.

I simulate the disconnected state by having two instances of the application’s DbContext and passing the primary keys of the Book and the new Author to add via a small class called ChangeAuthorDto.

Note: This unit test uses two, separate instances of the DbContext to simulate the disconnected state. With the help of Daria (see comment below) I have found a way to use Sqlite in-memory with multiple instances of the DbContext. Daria pointed out that the in-memory part is held in the Sqlite connection, so I create one options (which uses the Sqlite connection) and used that to create two separate DbContext instances  – thanks Daria.

[Fact]
public void TestAddAuthorDisconnectedOk()
{
    //SETUP
    var options = SqliteInMemory.CreateOptions<EfCoreContext>();
    ChangeAuthorDto dto;
    using (var context = new EfCoreContext(options))
    {
        context.Database.EnsureCreated();
        context.SeedDatabaseFourBooks();
        var book = context.Books
            .Include(p => p.AuthorsLink)
            .Single(p => p.Title == "Quantum Networking");

        var newAuthor = context.Authors
            .Single(p => p.Name == "Martin Fowler");
        dto = new ChangeAuthorDto
        {
            BookId = book.BookId,
            NewAuthorId = newAuthor.AuthorId
        };
    }

    using (var context = new EfCoreContext(options))
    {
        //ATTEMPT
        var book = context.Books
            .Include(p => p.AuthorsLink)
            .Single(p => p.BookId == dto.BookId);
        var newAuthor = context.Authors.Find(dto.NewAuthorId);

        book.AuthorsLink.Add(new BookAuthor
        {
            Book = book,
            Author = newAuthor,
            Order = (byte)book.AuthorsLink.Count
        });
        context.SaveChanges();

        //VERIFY
        var bookAgain = context.Books
            .Include(p => p.AuthorsLink).ThenInclude(p => p.Author)
            .Single(p => p.BookId == dto.BookId);
        bookAgain.AuthorsLink.ShouldNotBeNull();
        bookAgain.AuthorsLink.Count.ShouldEqual(2);
        var authorsInOrder = bookAgain.AuthorsLink.OrderBy(p => p.Order).ToList();
        authorsInOrder.First().Author.Name.ShouldEqual("Future Person");
        authorsInOrder.Last().Author.Name.ShouldEqual("Martin Fowler");
    }
}

The important lines are 27 to 30, where I load in the Book, with its BookAuthor linking collection accessed via the Book’s AuthorLinks property, and I read in the Author that the user wants to add.

Having loaded that book I can use the same code I had in the connected state, that is, I  create a new BookAuthor linking entry linking the Book entity I loaded to the Author entity of “Martin Fowler” and add it to the existing AuthorLinks collection – see lines 32 to 37.

Summary – Disconnected state many-to-many update

To update a many-to-many relationship in the disconnected state:

  1. In the first stage you should return the primary key(s) of the two ends of the many-to-many relationships that you want to link/unlink. In my example I provide the BookId and the AuthorId I want to add to the book.
  2. In stage two you load one end of the relationship (doesn’t matter which end) with the collection of links the the linking table. In my example I loaded the Book entity using the BookId key along with its AuthorLinks property, which held the collection of the linking entities in the BookAuthor table.
  3. Now load the other end of the relationship. In my example the Author entity, using the AuthorId returned from the first stage.
  4. Alter the linking table collection, either by adding or removing entries from the collection. In my example I added a new BookAuthor class to the AuthorLinks property collection.
    1. Note: if you want to replace the whole collection then you can simply assign a new list to the linking collection with the new BookAuthor class.

2a. Another, quicker way to update a many-to-many relationships in the disconnected state

In chapter 3 of my book, Entity Framework Core in Action , I point out that you can often change relationships using the foreign keys, and it can me more efficent. This is especially true of many-to-many relationships because you can directly add/delete entries from the linking table, in this case BookAuthor.

By directly adding/deleting the linking table entries then it saves you having to load the Book entity with its AuthorLinks property and the Author you want to add. Coupled with the fact that the first stage is going to supply you with the  primary keys anyway then it saves quite a few database queries.

The unit test below shows this in action. The first stage of the unit test hands back primary keys of the Book and the new Author to add via the ChangeAuthorDto class. The second stage, which uses a new instance of the DbContext, uses those primary keys to build a new BookAuthor linking entity to add to the database.

NOTE: The one downside of this approach for my Book/Author example is the setting the Order property, as it should be set to get the correct order of the authors. For this example know there is one author with an Order set to 0 so I simply set it to 1. In real application the first stage would have to define what order the authors should be shown in.

[Fact]
public void TestAddAuthorViaForeignKeyOk()
{
    //SETUP
    var options =
        this.NewMethodUniqueDatabaseSeeded4Books();

    ChangeAuthorDto dto;
    using (var context = new EfCoreContext(options))
    {
        var book = context.Books
            .Include(p => p.AuthorsLink)
            .Single(p => p.Title == "Quantum Networking");

        var newAuthor = context.Authors
            .Single(p => p.Name == "Martin Fowler");
        dto = new ChangeAuthorDto
        {
            BookId = book.BookId,
            NewAuthorId = newAuthor.AuthorId
        };
    }

    using (var context = new EfCoreContext(options))
    {
        //ATTEMPT
        context.Set<BookAuthor>().Add(new BookAuthor    
        {                                               
            BookId = dto.BookId,                        
            AuthorId = dto.NewAuthorId,                 
            Order = 1                                   
        });                                             
        context.SaveChanges();                          

        //VERIFY
        var bookAgain = context.Books
            .Include(p => p.AuthorsLink).ThenInclude(p => p.Author)
            .Single(p => p.BookId == dto.BookId);
        bookAgain.AuthorsLink.ShouldNotBeNull();
        bookAgain.AuthorsLink.Count.ShouldEqual(2);
        var authorsInOrder = bookAgain.AuthorsLink.OrderBy(p => p.Order).ToList();
        authorsInOrder.First().Author.Name.ShouldEqual("Future Person");
        authorsInOrder.Last().Author.Name.ShouldEqual("Martin Fowler");
    }
}

The important lines are 27 to 32, where I create a new BookAuthor class using the BookId and the AuthorId of the new author I want to add to the book. If you compare this to the last disconnected example you will see this avoids three database loads:

  1. The Book we wanted to update
  2. The Books AuthorLinks collection of BookAuthor entities
  3. And the Author that I wanted to add.

In this case I added an extra author. I could have removed an author by deleting the appropriate BookAuthor entry.

I do recommend this approach for many-to-many relationship updates because it is more efficient and still quite clear in its intent, i.e. its not complex so its obvious what is happening.

Summary – quicker way to update a many-to-many relationships in the disconnected state

For updates to many-to-many relationships in the disconnected state you can:

  1. In the first stage you should return the primary key(s) of the two ends of the many-to-many relationships that you want to link/unlink. In my example I provide the BookId and the NewAuthorId I wanted to add to the book.
  2. In the second stage you should add/delete entries from the linking table directly. For instance to add a new many-to-many relationship you add a new entry in the linking table. In my example I added a new BookAuthor link  with its BookId property set from the BookId returned by the first stage, and its AuthorId property set to the NewAuthorId value from the first stage.

Conclusion

Setting up and changing many-to-many links in EF Core is harder than in EF6.x. EF 6.x used the same approach as EF Core, but “hid” the linking table and the manipulation of that table from you. That certainly made it easier, but as the continued access to my original article on updating many-to-many in EF6.x shows, it can cause confusion sometimes.

At the moment EF Core makes you configure the linking table and update that linking table yourself. The EF Core roadmap says it will look at automating this in the future, but not for now. But I think there are some benefits in knowing what is going on in the database, and, as my last disconnected example shows, you can sometimes be more efficient that EF6.x is at updating many-to-many relationships.

I cover this an much more in my book Entity Framework Core in Action and you can access the unit test code you see here via the Chapter03 branch of the Git repo that goes with the book – see the test class Ch03_ManyToManyUpdate.

Happy coding!

Architecture of Business Layer working with Entity Framework (Core and v6) – revisited

I wrote an article a while ago called Architecture of Business Layer working with Entity Framework, which has been popular. I am now playing with Entity Framework (EF) Core and some sample code made me think – could I better isolate the EF part of my Business Logic? The answer is yes, and this article describes the new and improved approach, with a lot more examples.

Note: All the code in this article is using .NET Core and Entity Framework Core (EF Core). However, the code is easy to understand for anyone who has played with either flavour of EF, and the concepts work equally well with EF 6.x or EF Core.

UPDATE 2017: This article was a precursor to the book Entity Framework Core in Action that I am writing for Manning Publishing. Chapter 4 of the book covers business logic in detail.

The book also has a companion live site where the business logic in this article is used to provide an example checkout service. Have a go – no money needed:)

What is the Business Layer?

If you are new to idea of Business Logic, then I suggest you read the section near the top called ‘What is the Business Layer’ in my original article as it gives a good description. However, if you are in a hurry here is the short version.

Business Logic is what we call pieces of code that carry out operations that are specific to the job that the application is carrying out. For instance, on an e-commerce site like Amazon you would have Business Logic that handles a customer’s purchase: checking on availability and delivery, taking payments, sending an email confirmation etc. Business logic is definitely a step up on complexity over CRUD (Create, Read, Update and Delete) operations.

While Business Logic can be spread throughout an application and the database, it is accepted best practice to try and isolate the Business Logic into one place, which am calling the Business Layer.

Aside: While having all the business rules in the Business Layer is something we should always strive for I find that in practice some issues go outside the Business Layer for good reasons. For instance validation of data often flows up to the Presentation Layer so that the user gets early feedback. It may also flow down to the database, which checks data written to the database, to ensure database integrity.

Other business logic appears in the Presentation Layer, like not allowing users to buy unless they provide a credit card, but the status should be controlled by the Business Layer. Part of the skill is to decide whether what you are doing is justifiable or will come back and bite you later! I often get that wrong, but hopefully I learn from my mistakes.

The (improved) architecture of my Business Layer

Let me start with a diagram of the new, improved structure of the Business Logic within an application – in this case an ASP.NET web application but the same approach would work in many other application types.

BizLogic and BizDbAccess layer diagram - simplified

Next I will explain each of the main layers, with example code.

1. The DataLayer

The Data Layer is where the EF POCO (entity) classes are defined, along with the EF setup and DbContext. I am not going to describe how to create the EF entity classes or the setup of the DbContext as I will assume you already know about EF. For information on how to set up the database and EF entities see:

Note: I have shown the Data Layer as one assembly, but in bigger applications the EF entity classes are often in a different assembly to the EF setup code. In that case the BizLogic would only link to the EF entity classes.

2. The BizLogic Layer

I have written numerous applications, all of which has some quite complex Business Logic. Over the years I have tried different ways of organising these applications and I have come up with one key philosophy – that the “The Business Layer is King“, i.e. its design and needs drives everything else.

I am a big fan of Domain-Driven Design (DDD) and Eric Evans’ seminal book on DDD, Domain-Driven Design. Eric Evans’ book talks about how the business problem we are trying to solve, called the “Domain Model”, should be “the heart of the Software” (see Eric Evans book, page 4). Eric goes on to say “When the domain is complex, this is a difficult task, calling for the concentrated effort of talented ad skilled people”. Therefore, I try to make sure that the Business Logics data structures are defined by, and solely focused on, the business problem. How that data is stored and how that data is viewed are secondary issues.

Because of the way the EF works it is not sensible to combine the Business Logic with the actual EF POCO classes. Eric Evan’s recognises and discusses this problem in his book, see Eric Evans book, page 159. However, lots of people have tried to make EF more DDD-like.

NOTE: There is plenty of debate on how to apply DDD to EF. Personally I did try making the EF entity classes behave in a DDD way, but for me the amount of work for the gain in separation became too much (however, it is easier in EF Core though). In trying to go fully DDD with EF I felt I was losing the core tenant of Eric Evans book, which is that the business problem was the focus, not the technology. I recommend Jimmy Bogard comments near the end of this EF issue ‘The impedance mismatch between EF and domain models: what is Entity Framework’s answer?’, which I think are very helpful.

My rules for the Business Logic are:

a. The Business Logic data classes should not be affected by other higher layers

Business Logic can be hard to write. Therefore, I want to stay focused on business issue and not worry about how other layers above might need to reformat/adapt it. So, all data in or out of the Business Logic is either defined inside the BizLogic layer, or it is a Data Layer class. It is Service Layer’s job to act as an Adapter, i.e. Service Layer converts any data to/from what the Business Logic needs.

In new/modern databases the EF POCO (entity) classes should be a very good fit to the Business Logic needs, with a few extra properties needed to set up the database relationships, e.g. Foreign Keys. Therefore, the Business Logic can use the EF POCO (entity) classes directly without any adaption.

However, one of my reader, , pointed out that for for old/legacy databases the fit between what the database and what the Business Logic may not be a good match.

b. The Business Logic works on a set of in-memory data

I don’t want the Business Logic worrying about how to load and save data in the database. It is much simpler to design the Business Logic code to work on simple in-memory set of data. The changes described in this article improve this significantly over the previous article.

In the previous approach I used to have a specific part of the Business Logic, normally at the start, which loaded all of the data. However, the new approach has what I call a DbAccess class which handles all the reading and writing (see later for why this is better). The DbAccess is class is a Facade pattern.

Example Code

UPDATE: This article was a precursor to the book Entity Framework Core in Action that I am writing for Manning Publishing. There is a Git repo to go with the book, with the Chapter04 branch containing the code you see here. Chapter 4 of the book covers the whole area of business logic in detail.

Here is an example of Business Logic which is creating the first part of the order, which consists of an Order class instance with a number of LineItems. It is handed the customer’s Id, and list of LineItems which contain an entry for each BookId, followed by how many copies of that book the customer wants.

namespace BizLogic.Orders.Concrete
{
    public interface IPlaceOrderAction : IBizActionAsync<PlaceOrderInDto, Order> { }

    public class PlaceOrderAction : IPlaceOrderAction
    {
        private readonly IPlaceOrderDbAccess _dbAccess;
        private readonly List<string> _errors = new List<string>();

        public IImmutableList<string> Errors => _errors.ToImmutableList();

        public bool HasErrors => _errors.Any();

        public PlaceOrderAction(IPlaceOrderDbAccess dbAccess)
        {
            _dbAccess = dbAccess;
        }

        /// <summary>
        /// This validates the input and if OK creates an order 
        /// and calls the _dbAccess to add to orders
        /// </summary>
        /// <returns>returns an Order. Can be null if there are errors</return>;
        public async Task<Order> ActionAsync(PlaceOrderInDto dto)
        {
            if (!dto.TsAndCsAccepted)
            {
                _errors.Add("You must accept the T and Cs to place an order.");
                return null;
            }
            if (!dto.LineItems.Any())
            {
                _errors.Add("No items in your basket.");
                return null;
            }

            var booksDict = await 
                _dbAccess.FindBooksByIdsAsync(
                    dto.LineItems.Select(x => x.BookId))
                .ConfigureAwait(false);
            var order = new Order
            {
                CustomerName = dto.UserId,
                LineItems = FormLineItemsWithErrorChecking(
                     dto.LineItems, booksDict)
            };    

            _dbAccess.Add(order);

            return order;
        }

        private List<LineItem> FormLineItemsWithErrorChecking(
             IEnumerable<OrderLineItem> lineItems, 
             IDictionary<int,Book> booksDict)
        {
           // rest of the code left out…

A few things to point out about the above.

  1. We have created a generic IBizAction<Tin, Tout> interface that the PlaceOrderAction class has to implement. This is important as we use a generic BizRunner to run all the actions (described later in the Service Layer section). The IBizActionAsync interface ensures we have the Errors and HasErrors properties as well as the ActionAsync
  2. It uses the IPlaceOrderDbAccess provided via the constructor to a) find the books that were referred to in the order and b) to add the order to the EF db.Orders set.

NOTE: I also setup an interface, IPlaceOrderAction. I need this to get Dependency Injection (DI) to work. But I won’t be describing DI in this article. However, I really recommend the use of DI on any non-trivial application.

2. The BizDbAccess Layer (new)

The BizDbAccess layer contains a corresponding class for each BizLogic class that accesses the database. It is a very thin Facade over the EF calls. I should stress, I am not trying to hide the EF calls, or make a repository. I am just trying to Isolate the calls. Let me explain.

The problem with my previous approach, where the Business Logic called EF directly, wasn’t that it didn’t work, but that it was hard to remember where all the EF calls were. When it came to refactoring, especially when doing performance improvements, then I had to hunt around to check I had got every EF call. By simply moving all the EF calls into another class I have all of the EF commands in one place.

The main approach I will describe assumes that the database is of a new/modern design and there is a good fit between the data held in the database and the Business Logic requirements. The code below is a PlaceOrderDbAccess class, which goes with the PlaceOrderAction Business Logic described above:

public class PlaceOrderDbAccess : IPlaceOrderDbAccess
{
    private readonly EfCoreContext _db;

    public PlaceOrderDbAccess(EfCoreContext db)
    {
        _db = db;
    }

    /// <summary>
    /// This finds any books that fits the BookIds given to it
    /// </summary>
    /// <returns>A dictionary with the BookId as the key, and the Book as the value</returns>
    public async TaskIDictionary<int, Book>> FindBooksByIdsAsync(IEnumerable<int>; bookIds)
    {
        return await _db.Books.Where(x => bookIds.Contains(x.BookId))
            .ToDictionaryAsync(key => key.BookId)
            .ConfigureAwait(false);
    }

    public void Add(Order newOrder)
    {
        _db.Orders.Add(newOrder);
    }
}

Please note that I use a method called Add, to add the order to the EF Orders DbSet. As I said, I am not trying to hide what is going on, but just isolate the commands, so using the same terminology is helpful.

solution-view-bizlayer-bizdbaccessNOTE: To help with finding the relevant DbAccess I call the Business Logic class <name>Action and the DbAccess calls <name>DbAccess. They also have the same top-level directory name. See the attached picture of the solution directories on the right.

A secondary, but very powerful improvement, is that I could test my Business Logic without using an EF database. Because I provide an Interface then I can replace the DbAccess class with a Mock. Here is my Mock for PlaceOrderDbAccess.

public class MockPlaceOrderDbAccess : IPlaceOrderDbAccess
{
    public ImmutableList<Book> Books { get; private set; }

    public Order AddedOrder { get; private set; }
    public MockPlaceOrderDbAccess()
    {
        Books = CreateTestData.CreateBooks().ToImmutableList();
    }

    /// <summary>
    /// This finds any books that fits the BookIds given to it
    /// </summary>
    /// <param name="bookIds"></param>
    /// <returns>A dictionary with the BookId as the key, and the Book as the value</returns>
    public async Task<IDictionary<int, Book>> FindBooksByIdsAsync(
        IEnumerable<int> bookIds)
    {
        return Books.AsQueryable()
            .Where(x => bookIds.Contains(x.BookId))
            .ToDictionary(key => key.BookId);
    }

    public void Add(Order newOrder)
    {
        AddedOrder = newOrder;
    }
}

Being able to test all your Business Logic that way is very useful.

Handling old/legacy databases

One of my readers, , has been using a similar approach to the one describe in this article for some time. He pointed out in one of his comments that he finds this approach useful when the database tables are not a good fit for the Business Logic.

At that point, in addition to isolating the EF commands, the BizDbAccess class can carry out any adapting/ reformatting of the data between the database and the Business Logic.

For example I have come across database which does not mark its relationship keys and Foreign Keys. This stops EF from doing relational fixup, i.e. linking the various entities loaded. In that case you would add code to the DbAccess class to link the loaded entities so that the Business Logic can work on a linked, in-memory set of classes.

The Service Layer

As I said earlier in my applications the Service Layer is very important layer that links everything together. The Service Layer uses both the Command pattern and the Adapter pattern between the presentation layer and all the layers below.

Let me start by showing you one of what I call the BizRunner classes, which is the Command part of the code. This class contains the code to runs the PlaceOrderAction method above, which has a signature of taking data in and producing data out. It also needs data to be written to the database, hence the format/content of this BizRuner:

public class RunnerWriteDbAsync<TIn, TOut>
{
    private readonly IBizActionAsync<TIn, TOut> _actionClass;
    private readonly DbContext _db;

    public RunnerWriteDbAsync(IBizActionAsync<TIn, TOut> actionClass, DbContext db)
    {
        _db = db;
        _actionClass = actionClass;
    }

    public async Task<TOut> RunActionAsync(TIn dataIn)
    {
        var result = await _actionClass.ActionAsync(dataIn).ConfigureAwait(false);
        if (!_actionClass.HasErrors)
            await _db.SaveChangesAsync();

        return result;
    }
}

The RunnerWriteDbAsync BizRunner is a simple, but powerful class which, through the use of interfaces and generics can run different Business Logic.

One thing to note in this arrangement is that only the BizRunner’s RunActionAsync method should call the EF command SaveChangesAsync. That is important for a number of reasons:

  1. If any of the BizLogic/BizDbAccess methods called SaveChanges arbitrarily then you could get problems, with part of the data being saved while another part wasn’t. By having only the BizRunner calling SaveChanges it means that you know that all the Business Logic has finished.
  2. A subtler (and more advanced) reason is it allows us to chain together multiple Business Logic calls within a transaction (I describe this in the article ‘Architecture of Business Layer – Calling multiple business methods in one HTTP request’).
  3. The handling of errors is important, and you can see that the BizRunner will only call EF’s SaveChanges if there are no errors.
  4. EF6.x and EF Core differ on what happens when you call SaveChanges.
    1. In EF6.x it validates the data against any validation attributes, such as [MaxLength(20)], and against IValidatableObject if present. This means you will get a DbEntityValidationException exception if any validation error is found. The exception containing validation errors for every incorrect property or class that was new or updated.
    2. EF Core uses a lighter touch and does not validate the data, because in many cases the data has been already validated on input. Clearly that isn’t the case in business logic and you may want to add this feature back to EF Core by overriding SaveChanges and add validation – see this article for how to do that.

Note: The discarding of data by not calling ‘SaveChanges’ only works in situation where each call has its own DbContext. This is the case in a web application as each HTTP request gets a new DbContext. However, in Windows Applications etc. where the DbContext can be kept alive for longer you need to be careful about the lifetime/disposal of the DbContext.

Here is the code inside the Service Layer that uses the BizRunner to call PlaceOrderAction. You can see this acts as an Adapter pattern to the input and output of the Business Logic. The method:

  1. Gets current basket from a cookie and converts it into the format that the Business Logic needs.
  2. Calls the RunActionAction method from the PlaceOrderAction class.
  3. If RunActionAction method was successful it clears the basket, as the order now holds them.
  4. It extracts and returns the OrderId from the Order instance, as that is what the Presentation Layer needs.

The code is:

{
public class PlaceOrderService
{
    private readonly CheckoutCookie _checkoutCookie;
    private readonly EfCoreContext _db;

    public IImmutableList<string> Errors { get; private set; }

    public PlaceOrderService(IRequestCookieCollection cookiesIn, IResponseCookies cookiesOut, EfCoreContext db)
    {
        _db = db;
        _checkoutCookie = new CheckoutCookie(cookiesIn, cookiesOut);
    }

    /// <summary>
    /// This creates the order and, if successful clears the cookie
    /// </summary>
    /// <returns>Returns the OrderId, or zero if errors</returns>
    public async Task<int> PlaceOrderAsync(bool tsAndCsAccepted)
    {
        var checkoutService = new CheckoutCookieService(_checkoutCookie.GetValue());

        var action = new PlaceOrderAction(new PlaceOrderDbAccess(_db));
        var runner = new RunnerWriteDbAsync<PlaceOrderInDto, Order>(action, _db);

        var order = await runner.RunActionAsync(
            new PlaceOrderInDto(tsAndCsAccepted, 
            checkoutService.UserId, checkoutService.LineItems));
        Errors = action.Errors;

        if (action.HasErrors) return 0;

        //successful so clear the cookie line items
        checkoutService.ClearAllLineItems();
        _checkoutCookie.AddOrUpdateCookie(
             checkoutService.EncodeForCookie());

        return order.OrderId;
    }
}

You can see the PlaceOrderAction instance being created, along with its linked PlaceOrderDbAccess class. This would normally be done by Dependency Injection, but to make the example simpler to understand I have created it by hand.

You can see the adapter pattern in progress. On input the information is contained in a cookie, but the business layer shouldn’t have to deal with that. Therefore, the Service Layer method PlaceOrderAsync converts the cookie content into the form that the PlaceOrderAction wants.

Similarly, for the output of the PlaceOrderAction Business Logic, which returns an Order instance, but the presentation layer actually wants the OrderId, Again the PlaceOrderAsync method extracts the OrderId from the Order and returns that to the MVC Action.

Interestingly the pattern of returning a database class instance is quite typical, as the Business Logic won’t know what the entity’s key is, as it isn’t generated until the BizRunner in the Service Layer calls EF’s SaveChanges. It then becomes the job of the Service Layer to extract/adapt the parts that are needed by the Presentation Layer.

NOTE: The BizRunner and the code above works fine, but is simpler than what I use in practice. I have a private library called GenericActions, which has the same sort of features as my GenericServices library. This can identify the right signature of BizRunner to call by unpacking the Interface parts. GenericActions also uses the same ISuccessOrErrors<T> class as GenericServices, which returns both the status/errors and the class being returned. It also has the AutoMapper mapping built in to do any needed adapting.

Presentation layer

For completeness I have included the presentation layer, which in this case is an ASP.NET Core MVC page. If the order is successful it redirects to the confirmation page, otherwise it copies the errors into the ModelState and shows the basket again, with the error message(s).

public async Task<IActionResult> PlaceOrder(bool iAcceptTAndCs)
{
    var service = new PlaceOrderService(
        HttpContext.Request.Cookies, 
        HttpContext.Response.Cookies, _db);
    var orderId = await service.PlaceOrderAsync(iAcceptTAndCs);

    if (!service.Errors.Any())
        return RedirectToAction("ConfirmOrder", "Orders", 
             new { orderId});

    //Otherwise errors, so copy over and redisplay
    foreach (var error in service.Errors)
    {
        ModelState.AddModelError("", error);
    }
    var listService = new CheckoutListService(_db, 
        HttpContext.Request.Cookies);
    return View(listService.GetCheckoutList());
}

I try to keep my Controller Actions small and simple by putting as much logic as possible into the Service Layer or lower layers. I do this because it is hard to test Controller actions, but much easier to Unit Test the Service Layer or lower layers.

Quick aside – handling errors needs thought.

This article is already quite long, so I won’t dwell on this subject, but providing good error feedback to the user is not trivial. Many libraries use Exceptions to report errors – some of these errors are user friendly, like Validation Errors. However, many Exceptions, such as EF’s DbUpdateException, are very unfriendly and sometimes produce messages that relate to the inner working for the application, and hence should not be shown for security reasons.

In this article I have implemented a very simple error feedback mechanism. In practice you would need to implement a much more comprehensive approach. I suggest you look at my open-source library, GenericServices, with uses ISuccessOrErrors<T> to return a result with possible error messages.

Conclusion

Writing CRUD (Create, Read, Update and Delete) methods used to take a lot of (boring) development writing boiler-plate code. I solved that by writing a library called GenericServices, which has radically improves the speed of development (see this site CRUD-only site, which I wrote in about 10 days).

I now find I spend most of my development effort on a) the Business Logic and b) building responsive user-interfaces. Therefore, continuing to improve my approach to Business Logic writing is well worth the effort, which is what this article is all about. (See a different article on how I am improving my building responsive user-interfaces too).

I recently reviewed an e-commerce application I developed which used the older approach to writing Business Logic, i.e. EF called in the Business Logic. The Business Logic was quite complex, especially around pricing and delivery. My conclusion was that the Business Logic work fine, but it was sometimes difficult to find and understand the database accesses when refactoring and performance tuning the Business Logic.

I tried out the new approach described in this article by refactoring some of the Business Logic in this e-commerce application over to the new approach. Overall it did make a difference by bring all the EF access code into one clear group per Business Logic case.

In additions I wrote an example application using ASP.NET Core and EF Core with this new approach, again with good results (this is where the example code comes from in this article). Couple this with the other positive feature of being able to test the Business Logic with a Mock DbAccess class rather than accessing EF directly and it is clearly a good step forward.

The only down side to this new approach is that does need one more class than the previous approach. However, that class does provide a much better separation of concerns. Therefore, I think this new approach is a positive improvement to my previous approach to building Business Logic and is worth the effort extra effort of another class.

I hope this article gives you some ideas on how to better design and build complex Business Logic in your applications.

Flattening Entity Framework relationships with ExpressMapper

This article looks at how Microsoft’s Entity Framework (EF) handles extracting data from nested relationships, i.e. linked EF classes. This is a process known as flattening. I will show that EF produces efficient SQL commands for this type of access.

I then talk about making development of such accesses by using an Object-to-Object mapper and introduce the new Flattening feature in ExpressMapper, a relatively new mapper that is quite lean and quick. The last part is a detailed description, with lots of examples, of how to use flattening in ExpressMapper and what is is capable of.

Quick background on flattening

If you understand the issues around relational databases, foreign keys and flattening then skip this.

Well constructed relational databases try to minimise the duplication of data, e.g. only have one copy of the customer details. They do this by creating extra tables to hold each of these unique pieces of data and using ‘foreign keys’ if another piece of data needs to refer to it. Let me give you an example.

In the diagram below, which shows the EF classes, a product has a ‘Variant’, which is linked by the foreign key ‘VariantId’ and that Variant can have a number of size information, given by the ‘Size’, which has a foreign key ‘SizeId’.

Product, ProductVariant, Size

The benefit of splitting the data like this is we can change say the Color information on a specific  ProductVariant and it is automatically picked up by all the products that point to it. In software terms we would call this an application of the DRY (Don’t Repeat Yourself) principal.

The down side if if we want to show a list of products on screen its quite likely that we want to include the Color and Size information. This means we need to ‘pick out’ the Color column from the linked ‘ProductVariant’ table, and the Name column in the Variant’s linked ‘Size’ table. This is called flattening and happens a lot in real-life applications.

How Entity Framework does flattening

I’m going to use a very simple example to show how EF does flattening. The diagram below shows a simple Father->Son->Grandson nested relationship.

ExpressMapper-father-son-grandson

The EF command to get all of the properties from all of the linked records would be:

var flattened = dbContext.Fathers.Select(f => new { 
    f.MyInt,
    f.MyString, 
    SonMyInt = f.Son.MyInt,
    SonMyString = f.Son.MyString, 
    GrandsonMyInt = f.Son.Grandson.MyInt,
    GrandsonMyString = f.Son.Grandson.MyString});

The result is an anonymous class containing all the data from all the relationship.

The T-SQL EF produces for this access

It turns out that EF produces the very efficient T-SQL command for this sort of flattening. The T-SQL command that EF puts out is as follows:

SELECT 
    [Extent1].[MyInt] AS [MyInt], 
    [Extent2].[MyInt] AS [MyInt1], 
    [Extent2].[MyString] AS [MyString], 
    [Extent3].[MyInt] AS [MyInt2], 
    [Extent3].[MyString] AS [MyString1], 
    [Extent1].[MyString] AS [MyString2]
    FROM   [dbo].[Father] AS [Extent1]
    INNER JOIN [dbo].[Son] AS [Extent2] 
        ON [Extent1].[Son_Id] = [Extent2].[Id]
    LEFT OUTER JOIN [dbo].[Grandson] AS [Extent3] 
        ON [Extent2].[Grandson_Id] = [Extent3].[Id]

This is an optimal T-SQL command and here is execution plan that proves it.

ExpressMapper-father-son-grandson-sqlNOTE: Flattening is only useful for reading data, as any update to a column in a related table can only be done in EF by reading in the inner class, updating its data and calling EF’s .SaveChanges(). However reading data for display is very common and therefore we want it to be really efficient.

Using an Object-to-Object Mapper with EF

My experience has been that you spend a LOT of time writing EF .Select() statements like the one above in even a small application. They are repetitive and boring, which often leads to making mistakes. This caused me to build the open-source package GenericServices, which uses an object-to-object mapper  (currently AutoMapper, but l am planning to change to ExpressMapper).

The idea behind using an object-to-object mapper (referred to as mapper from now on) is that you create a class, or classes, which represent what you need from the database. These classes are often called DTOs (Data Transfer Objects), or in ASP.NET MVC style ViewModels – I will refer to them as DTOs in this article.

The role of the mapper is to work out how to map the EF classes to the DTO(s). The mapper does this based on a set of rules. Let’s consider one implementation of an object-to-object mapper that will work with EF – ExpressMapper.

Introducing ExpressMapper

As I said earlier I have used AutoMapper, but I have been tracking the development of a new mapper called ExpressMapper, which has some advantages. ExpressMapper is newer than AutoMapper and relies on building LINQ commands for all its mappings. The main advantage is that its setup is much faster than AutoMapper, and one or two other small things make it fit with EF more easily.

However while I wanted to start using ExpressMapper but it didn’t have the flattening feature. So, with the help of ExpressMapper’s author, Yuriy Anisimov, I added the flattening feature. This is now released in version 1.8.1 which you can find as a Nuget package here.

I will give you three examples of how ExpressMapper’s flattening works and then describe the differences between ExpressMapper and AutoMapper, which is used more widely because it has been around a long time.

1. Flattening nested references

1.a. Non-null nested references

In the case of flattening ExpressMapper and AutoMapper uses a concatenated name rule,  e.g. SonMyInt would match Son.MyInt and SonGrandsonMyString  would match Son.Grandson.MyString. So, looking at our previous example of Father->Son->Grandson if we created a class as shown below and then ran the code at the end then ExpressMapper would produce the same sort of result, and SQL that my hand-coded select would have done.

public class FlattenDto
{
   public int MyInt { get; set; }
   public string MyString { get; set; }

   public int SonMyInt { get; set; }
   public string SonMyString { get; set; }

   public int SonGrandsonMyInt { get; set; }
   public string SonGrandsonMyString { get; set; }
}

...
//code to set up the mapping 
ExpressMapper.Mapper.Register<Father, FlattenDto>().Flatten();
ExpressMapper.Mapper.Compile(CompilationTypes.Source);
//code to execute the EF command and get the data
var flattened = dbContext.Fathers.Project<Father, FlattenDto>().ToList(); 

The SQL code is the same as I showed on the hand-written code.

1.b. Nullable nested references

If either the Son or the Grandson can be null then these relationships are called an ‘optional relationship’ in EF, or One-to-ZeroOrOne references in SQL. ExpressMapper handles null references, so handling null references mapping is fairly straight forward. However you do need to think that some of the properties may be null if a nested relationship is null, otherwise EF will complain that it can put a null in it.

For example if we take our previous example and we make the GrandSon an optional relationship with the following EF configuration (see the HasOptional on line 7) then the Grandson property may be null.

public class SonConfiguration 
   : EntityTypeConfiguration<Son>
{
   public SonConfiguration()
   {
      ToTable("Son");
      HasKey(t => t.Id);
      HasOptional(t => t.Grandson);
   }
}

Having done that ExpressMapper will handle not having a grandson (or a son if you set that as optional too). However we do need to change the DTO, as now the the SonGrandsonMyInt can now be null (see the int? on line 13 of this class definition from the Unit Tests of ExpressMapper).

//code to set up the mapping 
ExpressMapper.Mapper
    .Register<Father, FlattenDtoWithNullable>().Flatten();
ExpressMapper.Mapper.Compile(CompilationTypes.Source);
//code to execute the EF command and get the data
var flattened = dbContext.Fathers
    .Project<Father,FlattenDtoWithNullable>().ToList(); 

The result of this will be that SonGrandsonMyInt and SonGrandsonMyString will both be null if no Grandson relationship exists, but will contain the Grandson’s MyInt and MyString if the relationship does not exist.

NOTE: AutoMapper does not have this feature by default – see later section of differences between ExpressMapper and AutoMapper.

2. Running Linq commands on collections

As well as accessing columns in one-to-one relationships there is another feature that EF and ExpressMapper’s flattening can provide for one-to-many relationships. That is it can convert a small but useful number of LINQ collection commands into SQL commands. The commands that EF supports are:

  • Any: This returns true if there is one or more rows in the collection.
  • Count: This returns an int of the number of rows in the collection.
  • LongCount: Same as Count, but returns a long (64 bit) type.

To use these you simply add the method name on the end of a EF POCO class properly that implements IEnumerable (but not a string). Let me give you a very simple example. The code below shows a EF POCO class called FatherSons which contains a collection called Sons. The DTO below it has a property called SonsCount, which ExpressMapper turns into the LINQ command Sons.Count().

public class FatherSons
{
   public int Id { get; set; }

   public int MyInt { get; set; }
   public string MyString { get; set; }

   public ICollection<Son> Sons { get; set; }
}
public class FlattenFatherSonsCountDto
{
   public int MyInt { get; set; }
   public string MyString { get; set; }

   public int SonsCount { get; set; }
}

...
//code to set up the mapping 
ExpressMapper.Mapper
    .Register<FatherSons, FlattenFatherSonsCountDto>().Flatten();
ExpressMapper.Mapper.Compile(CompilationTypes.Source);
//code to execute the EF command and get the data
var flattened = dbContext.Fathers
    .Project<FatherSons, FlattenFatherSonsCountDto>().ToList(); 

The resulting SQL is again very good (see below):

 
SELECT 
    [Extent1].[Id] AS [Id], 
    (SELECT 
        COUNT(1) AS [A1]
        FROM [dbo].[Son] AS [Extent2]
        WHERE [Extent1].[Id] = [Extent2].[FatherSons_Id]) AS [C1], 
    [Extent1].[MyInt] AS [MyInt], 
    [Extent1].[MyString] AS [MyString]
    FROM [dbo].[FatherSons] AS [Extent1]

Full list of ExpressMapper flattening features

  1. Flattening looks at properties inside classes, either EF relationships or EF Complex Types. For the DTO to match it must match on two counts:
    1. The Name must match: The DTO name should be  a concatenating the names without the dots, e.g. Variant.Size.Name is accessed by having a property called VariantSizeName in the DTO.
    2. The Type must match: The type of the DTO property must be either:
      1. The same type as the inner property, e.g. SonMyInt can be of type int.
      2. A nullable version of type of the inner property, e.g. SonGrandsonMyInt is of type Nullable<int>. This is useful when you have optional relationships (see 3 below).
      3. A class that has been registered with ExpressMapper as mapping between the inner property and the DTO class (see 2 below for a better explanation!)
  2. You can have nested DTOs within DTOs. For instance in our Father->Son->Grandson examples you can have a property called SonGrandson, which is a DTO class. The only rule is you must also register the SonGrandson DTO class as well as the outer DTO. There is an example of this in the ExpressMapper Unit Tests – see test on line 95 of FlattenEntityTests.cs and note the registering of the extra DTO class on line 27 of the same file.
  3. Flattening can handle optional relationships, i.e. null references to a class. In that case it will return null for all the properties you access in the nested class(es). See example 1.b above.
  4. Flattening can handle circular references, e.g. if class person has a property called Boss of type person then you can have a DTO property such as BossBossBoss to get the third boss up (or null if the hierarchy doesn’t go that high).
  5. Flattening also looks at collection properties, i.e. one-to-many relationships.  If the DTO contains a property with the name of the collection, plus an ending of Any, Count or LongCount then it adds the LINQ method of that name. The property must be of the right type, i.e. bool, int or long respectively otherwise you get an ExpressMapperException.

Configuring flattening in ExpressMapper

  • Flattening only happens if you add the method .Flatten() to the registration of the mapping.
  • If you use ExpressMapper’s .Ignore() or .Member() methods in registration with Flattening then your .Ignore() or .Member() calls will take over from flattening, i.e. Flattening will not touch those source properties. The order in which you apply .Flatten, .Ignore and .Member does not matter.
  • The comparison of names follows ExpressMapper configuration, which defaults is case insensitive. For instance in the Variant.Size.Name case the match would work with variantsizename or any other upper/lower sequence of the same letters.
    NOTE: this is useful when you are mapping to json in an API and you often want the first letter of the property to be lower case.
  • Flattening only works on the Source side, i.e. it does not ‘unflatten’ if you do a DTO->class mapping. I did not implement this because it just doesn’t work with EF, which needs you to link to tracked classes. It is also very difficult to get right in all cases, e.g. what do you do with a Count??
    NOTE: That is why you see the command ExpressMapper.Mapper.Compile(CompilationTypes.Source) in the examples. Because I only use the class->DTO direction then I don’t bother to compile the reverse direction.

Comparison with AutoMapper

Many people, including me, are already using AutoMapper, so it is useful to point out the differences.

  1. AutoMapper is a bit more specific on matching the concatenated names, and has some extra features for changing this. My implementation in ExpressMapper simply matches the next part of the string as it goes, plus the type. I think AutoMapper would not accept vAriantsiZenAme but ExpressMapper’s Flatten will.
  2. AutoMapper has other flattening features, like GetMethod. ExpressMapper’s Flattening does not support that as they don’t work with EF.
    NOTE:  The way to overcome this is either by specific .Member() set ups or more elegantly by using DelegateDecompiler. DelegateDecompiler allows you to add ‘computed’ properties which DelegateDecompiler can turn into LINQ code that EF can execute. I use DelegateDecompiler in my GenericServices package and it is very helpful. You should look it up!
  3. AutoMapper does not, by default, handle nullable nested references (see example 1.b above). This is a plus for ExpressMapper.

Conclusion

I wanted to get three main things across in this article:

  1. Entity Framework is creates very efficient SQL code when accessing nested One-to-One or One-to-ZeroOrOne relationships.
  2. Object-to-Object mappers can make your life a lot easier in writing EF POCO classes to DTO code.
  3. ExpressMapper now has a good Flattening feature.

When I have the time (!) I plan to update my GenericServices project to use ExpressMapper. It has some nice features and my very simple testing says its about eight times faster on setup, i.e. the registering of the mappings, than AutoMapper. However ExpressMapper is about the same speed as AutoMapper on the actual mapping of the data.

I should say that EF doesn’t do so well at producing efficient T-SQL commands if you use EF’s .Include() method to eager load properties that are collections. I would like to write about that later if I have time, as there are definitely some things to watch out for in cases like that.

Well done for getting to here. The article is a bit long but hopefully useful.

Happy coding!

Handling Entity Framework database migrations in production – part 4, release of EfSchemaCompare

Some months ago I wrote an article on Simple-Talk ‘Deploying an Entity Framework Database into Production‘, with a more detailed series on this blog site (see list below). I have finally found time to release the package I mentioned in the articles as an open-source (MIT) project and also on NuGet as EfSchemaCompare.EF6.

In this article I explain where EfSchemaCompare is useful and give three real-world users of the package in my own e-commerce application development. The whole series deals with the issues of creating/updating a database schema in a robust and testable way. The list of articles are:

Where would you find EfSchemaCompare useful

The main reason to use EfSchemaCompare is if you use EF with a database but also want to:

  1. Take over the creation, definition or migration of the database rather than having EF handle it.
  2. You want to build a EF model that works with an existing database.

If you work this way then you have a problem: you need to make sure any changes to the database are matched by changes to the EF classes/configuration, or vise-versa. For instance, if you added a foreign key relationship in your database, or a new column to a table then you need to make the correct changes to the EF classes/configuration.

The important thing to see is that, other than when using EF reverse engineering (see next paragraph), EF never looks at the schema of the database, even when it is doing migrations. It just assumes the database in the state that EF’s metadata model says it is. (For more on how this works under the hood then see Max Vasilyev’s interesting article about how EF migrations are calculated and stored).

So one way to overcome any mismatch between EF and SQL is to re-import the database via EF’s reverse engineer the database option. However I have found that to be a rather blunt instrument as it adds all the possible relationships and also has fixed names for properties.

However now EfSchemaCompare gives you a second option. EfSchemaCompare is designed to check that EF’s view of the database is in line with what SQL says the database schema is. You can use it in your Unit Tests, or in your deployment scripts. It returns a go/no-go response with useful error/warning messages to spot what is wrong. It can also help you work out how to set up SQL tables to handle some of the more complex EF table relationships, like Many-to-Many tables (see example 3 near the end).

Let me give you three examples of how I use EfSchemaCompare.

Examples of how I use EfSchemaCompare

I was building an e-commerce web site on Azure and I wasn’t comfortable that EF’s Code First Migrations approach was robust enough in my situation (read the first article for more details on why). I therefore swapped over to using DbUp, which is a script-based database deployment/update tool.

However it turns out matching a database change to the form that EF needs is quite complex! I therefore wrote EfSchemaCompare . I will detail three different ways I use EfSchemaCompare in my Unit Tests and deployment tests.

1. General checking in Unit Tests

The NUnit Test below is part of my standard tests and it simply checks that EF’s current model of the database matches the database my Unit Test has.

[Test]
public void CompareEfSqlTestLabelAppDbOk()
{
    using (var db = new MyDbContext())
    {
        var comparer = new CompareEfSql();
        var status = comparer.CompareEfWithDb(db);

        Assert.True(status.IsValid, status.GetErrors());
        Assert.False(status.HasWarnings, string.Join("\n", status.Warnings));
    }
}

The method it uses, CompareEfSql.CompareEfWithDb, is fairly quick, assuming the MyDbContext has been used in the Unit Tests, and finds 90% of the issues that can occur. It also gives error messages that talk about the EF classes rather than in SQL terms, so its easier for software developers to understand.

Note: The above version assumes the EF POCO classes are in the same assembly as the EF DbContext. There is another version if you have your EF POCO classes in a different  assembly to the EF DbContext (I do!). There is also another version if you have multiple DbContexts covering one database.

2. Detailed checking before I release to production

The NUnit Test below checks my production SQL database against the development database. The SQL to SQL checking is more thorough, but only checks things that EF cares about (see list of limitations).

[Test]
public void CompareLabelAppDbWithAzureTestLabelAppDbOk()
{
    using (var db = new MyDbContext())
    {
        var comparer = new CompareSqlSql();
        var status = comparer.CompareSqlToSql( "LabelAppDb", "AzureLabelAppDb");

        Assert.True(status.IsValid, status.GetErrors());
        Assert.False(status.HasWarnings, string.Join("\n", status.Warnings));
    }
}

The method it uses, CompareSqlSql.CompareSqlToSql which is very quick. It takes either a connection string name, or a full connection string. The error messages are all about SQL, so can be harder to interpret if your aren’t use to SQL, but its comprehensive.

Note that I use the more strict version where my Unit Test fails on warnings (see line 10 test).  Warning are things that should not affect EF, but are worth knowing about. You may not want to be so strict, but in my application I am. See the documentation on the difference between errors and warning.

3. When I am making complex changes to the database

Some changes are really easy, like adding a new column to a table. However when you change relationships it can be a challenge to get the SQL and EF classes in step. There is another command, CompareEfGeneratedSqlToSql, which is very useful.

This generates a brand new database using EF, with the name of the DbContext but with ‘.EfGenerated’ appended on the end. It then compares this with your current database using SQL-to-SQL testing.

[Test]
public void CompareLabelAppDbWithAzureTestLabelAppDbOk()
{
    using (var db = new MyDbContext())
    {
        var comparer = new CompareSqlSql();
        var status = comparer.CompareEfGeneratedSqlToSql(db, "LabelAppDb");

        Assert.True(status.IsValid, status.GetErrors());
        Assert.False(status.HasWarnings, string.Join("\n", status.Warnings));
    }
}

The method it uses, CompareSqlSql.CompareEfGeneratedSqlToSql, is quite slow because it gets EF to create a new database. However it does provide a way to a) compare exactly what EF would have done against what your SQL database looks like and b) you get a EF generated database from your current EF classes/configuration to inspect yourself.

Some things, like EF’s Many-to-Many relationships, require the SQL database to have a new table with a specific name and setup. Failing to follow the style that EF needs will mean EF won’t fill in the many-to-many links. Adding the right tables is often a case of seeing what EF does by inspecting the ‘.EfGenerated’ database and copying the SQL EF generated into your new database (or implementing your own many-to-many handling of course, which I cover in Part 2 of this article).

The other thing I noticed is that EF is somewhat forgiving if you make mistakes when you use EF’s Data Annotations or EF’s Fluent API to alter the way the database works. For instance if you configure a required to optional relationship but give it a non-nullable foreign key it will add its own hidden nullable foreign key. In fact I found about three errors like this in my 22-table application, all around relationships.

EfSchemaCompare will point out these discrepancies and you can fix them. Your minor configuration mistakes can be found mainly by the column names, which normally contain the name with a _ in the middle. I have also found places where the Cascade delete option was different, again through me misconfiguring a relationship.

Conclusion

Hopefully this article will give you a feel for why you might find the NuGet package EfSchemaCompare.EF6 useful, plus some tips on how to use it in your application. Its certainly a niche tool, but if you are using EF on a database that is not controlled by EF then its a life-saver. Do have a look at the project on GitHub for documentation and Unit Tests that might give you more ideas on how to use it.

Finally I wanted to thank and Hakon Thomas who field tested the first version. Their feedback has shaped the tool and hopefully made the documentation a bit clearer too.

Happy coding.


Want some help with EF?

Are you looking for someone to help you use Entity Framework more effectively in your business? I am a contract software developer / architect with many years of experience and I am happy to work remotely.

Have a look at my Hire Me page and drop me a email via my contact me form to see if I can help.


 

Handling Entity Framework database migrations in production – part 2, Keeping EF and SQL scheme in step

This is the second article in a series about using Microsoft’s data access technology, Entity Framework (EF) in an application that to be ‘always on’, in my case an ASP.NET MVC e-commerce web site, yet is still being developed so there is a need to update the database schema, i.e. the tables, views, constraints etc. that define how the data is held in the database.

In this article I look at the problem of keeping EF’s view of the database schema, known as the EF database model, in step with the actual SQL database schema. The list of articles will be:

I will start by talking about how EF works, especially around how EF builds its in-memory view of the database, known as the database model, and what happens when it applies database migrations.

Note: Please see the first article for my reasons why I decided EF’s standard migration feature was not sufficient.

Matching Entity Framework changes to Database changes

EF is great to develop with as it does a number of clever things in the backgrounds to help developers. One of the main useful things EF does is make a relational database look like a set of linked classes. This is very clever, but to do this EF’s view of the database relationships must match the reality of what is in the d. To understand why this is problem then I need to describe how EF knows about the database schema.

How EF finds out about the database schema

EF holds a view of what it thinks the database schema is in what it calls the ‘database model’. This is a set of metadata that it builds at application start-up from the EF DbContext class and the associated classes referred to by your DbContext. There are therefore two main ways of telling EF what the database schema looks like. They are:

  1. EF uses your code to define the database schema.
    In this mode the developer normally defines/designs the classes and configures a special class called DbContext, which EF uses to calculate the EF metadata model, i.e. its view of what the database schema should look like. Then used what EF calls a ‘database initializer’ to create the database. There are then ways to update the database if you change the EF database model, see Code First Migrations.
  2. EF imports the schema of and existing database to create the data classes.
    The other way is to allow EF to build the classes and DbContext based on the schema of an existing database. It does this by a one-time scan of the database schema and using templates to build the classes and DbContext (see another of my articles which describes this in detail). If you change the database then you can either re-import database schema again, or there is also a way of using Code First Migrations with an existing database to handle update.

The important thing to see is that, other than the one-time scan in point 2, EF never looks at the schema of the database, even when it is doing migrations. It just assumes the database in the state that EF’s metadata model says it is. (For more on how this works under the hood then see Max Vasilyev’s interesting article about how EF migrations are calculated and stored).

What happens if EF’s database model is not correct?

EF will only find out there is a mismatch between its database model and the actual database schema when it accesses the part of the database that does not match. I have not tried all possible combinations, but the error I got when I added a properly called ‘EfOnlyProperty’ to an EF class without a corresponding column in the table was:

System.Data.Entity.Infrastructure.DbUpdateException : An error occurred while updating the entries. See the inner exception for details. —->
System.Data.Entity.Core.UpdateException : An error occurred while updating the entries. See the inner exception for details. —->
System.Data.SqlClient.SqlException : Invalid column name ‘EfOnlyProperty’.

Another example is when EF creates a database then it can add some hidden tables to facilitate many-to-many relationships (see my article on how EF handles many-to-many relationships). This means that any changes to the database must include these extra tables for EF to work properly. If a class called ‘Parent’ has a many-to-many relationship with a class called ‘Child’ and the linking table is missing then you get this error.

System.Data.Entity.Infrastructure.DbUpdateException : An error occurred while saving entities that do not expose foreign key properties for their relationships. The EntityEntries property will return null because a single entity cannot be identified as the source of the exception. Handling of exceptions while saving can be made easier by exposing foreign key properties in your entity types. See the InnerException for details. —->
System.Data.Entity.Core.UpdateException : An error occurred while updating the entries. See the inner exception for details. —->
System.Data.SqlClient.SqlException : Invalid object name ‘dbo.ParentChild’.

I needed an EF/SQL sanity checker

The idea of changing my database and having EF out of step was not appealing at all. That way you just don’t know what will happen, which is deadly.

I could build the database using SQL and use EF’s import existing database method described earlier. However I know from experience that it can be a pain when database updates come along and you need to re-import again. Also EF’s database import uses set property names and exposes every relationship, which isn’t ideal for good programming practice.

So I wanted to use EF Code First approach yet define the database through T-SQL. That meant I needed something to check that my EF/software view of the database schema and the actual SQL database schema were in step. Ideally this would be something I could run as part on my Unit Tests so that any mismatch shows up immediately.

My solution to comparing EF/SQL databases

I therefore created some software that compared the EF database model metadata against the actual SQL schema as read from a database. This method runs as a Unit Test and reads the EF database model metadata and reads the actual SQL database schema. It then compares the following:

  • Each table: does a table exist for each EF class?
  • Each column: does a column match EF class properties in name, type/nullable, size, primary key(s) and key order?
  • Each relationship:
    • Do SQL foreign key constraints exist for each EF relationship?
    • Have the extra many-to-many tables that EF used been configured properly?
    • Are the Cascade Deletes the same between EF and SQL?

If any of the above is out of step then the Unit Test fails and outputs useful error messages. At the same time it also produces warnings for tables and/or columns in the SQL database that EF does not use. These should not cause a problem to EF, but might show something that EF has missed.

Here is an example of a call to CompareEfWithDb to check that the EF’s model matches the schema of the database that the ‘YourDbContext’ is connection string points to:

using (var db = new YourDbContext())
{
    var comparer = new CompareEfSql();

    var status = comparer.CompareEfWithDb(db); 
    //status.IsValid is true if no errors. 
    //status.Errors contains any errors.  
    //status.Warnings contains any warnings
}

In my Unit Test I fail the test if status returned is not valid and I print out any error messages. I also tend to fail the test on warnings too, as it often points to something I have missed.

Another slightly different method has an extra parameter with a connection string to a separate database. This allows me to check a different database schema, e.g. my production database, against the current EF model. By running this as an extended Unit Test before deployment I make sure that I know if the production database needs to be updates before the new software is deployed. This gives me great peace of mind that my SQL databases are in line with the current EF data classes.

Here are two examples of the type of error output I get when running the CompareEfWithDb method. The first is a simple example is of a missing column in the database. The error message from calling the CompareEfWithDb method is:

Missing Column: the SQL table [dbo].[Child] does not contain a column called EfOnlyProperty. Needed by EF class Child.

The second example is a class called ‘Parent’ which has many-to-many relationship with a class called ‘Child’ and the linking table is missing. The error messages from calling the CompareEfWithDb method are:

  1. Missing Link Table: EF has a Many-to-Many relationship between Parent.ManyChildren and Child but we could not find a linking table with the right foreign keys.
  2. Missing Link Table: EF has a Many-to-Many relationship between Child.ManyParents and Parent but we could not find a linking table with the right foreign keys.

Add CompareSqlToSql for a fuller picture

I found the CompareEfWithDb method very helpful, but it doesn’t quite cover everything. At time goes on I am planning to move some of the more complex access to SQL Stored Procedures (SPs) to both gain better performance and facilitate the decoupling of the database from the software. EF can use SPs but they don’t appear in EF database model, so CompareEfWithDb couldn’t help.

Having created all the code for CompareEfWithDb it was pretty simple to create a SQL to SQL compare, called CompareSqlToSql. This could check one database against another, and can include a comparison of other SQL features. It does a slightly fuller comparison that the EF/SQL does as we can compare everything rather than surmising some things, like many-to-many table settings, from EF. It can also check non-EF parts like SPs and their parameters.

Note: You still need good integration and system level testing to catch anything that these tools miss.

Using SQL compare to understand what EF is doing

I should say that CompareSqlToSql has proved to be much more useful than just checking SPs. It turns out that EF is somewhat forgiving if you make mistakes when you use EF’s Data Annotations or EF’s Fluent API to alter the way the database works. For instance if you configure a required to optional relationship but give it a non-nullable foreign key it will add its own hidden nullable foreign key. In fact I found about three errors like this in my 16-table application, all around relationships.

If you want to create a database that the EF database can use you need to fix these errors in your code, or at least replicate what EF has done for the database to work. CompareEfWithDb won’t spot them for you, but comparing an EF-generated database with your script-generated database will find them. To do this you need to:

1. Create a new database using the DbContext method .Database.Create() then you have a SQL database based on EF’s database model. We will call it MyEfDb and I give the code below:

using (var db = new YourDbContext(MyEfDbConnectionString))
{
    if (db.Database.Exists())
    {
        db.Database.Delete();
    }
    db.Database.Create();
}

2. Then create a database using your scripts and DbUp (see first article). We will call it MySqlDb.

3. Now run CompareSqlToSql (or another compare scheme tool – see below) with MyEfDb as the reference database and MySqlDb as the ‘to be checked’ database.

You can spot EF corrections to your minor configuration mistakes mainly by the column names, which normally contain the name with a _ in the middle. I have also found places where the Cascade delete option was different, again through me misconfiguring a relationship.

Using a SQL compare to build your scripts

The other reason for doing a SQL compare of an EF-generated database with your current script-generated database is to build the scripts you need to update your database after you have made changes to the EF classes. There are various tools that can compare two SQL databases and provide a script to update one to the other – see this useful list on stackoverflow, although it is a bit old. Note: some of these are commercial products that you need to buy.

If you don’t have one of these then the output of CompareSqlToSql will show all the SQL differences, but not in a nice SQL script way. For instance if we repeat the many-to-many example above, with a class called ‘Parent’ which has many-to-many relationship with a class called ‘Child’ and the linking table is missing. The error messages from CompareSqlSql are:

  1. Missing Table: The ‘MyEfDb’ SQL database has a table called [dbo].[ParentChild], which is missing in the MySqlDb’ database.
  2. Missing Foreign key: The ‘MyEfDb’ SQL database has a foreign key Parent: ParentChild.Parent_ParentId, Referenced: Parent.ParentId, which is missing in the ‘MySqlDb’ database.
  3. Missing Foreign key: The ‘MyEfDb’ SQL database has a foreign key Parent: ParentChild.Child_ChildId, Referenced: Child.ChildId, which is missing in the ‘MySqlDb’ database.

Note: If anyone is interested in the CompareEfWithDb and CompareSqlToSql Unit Test methods I could make the package publicly available. However the first version was a massive 2-day hack and does not handle all possible EF combinations, such as TPT and TPH inheritance, complex types etc. It therefore needs a lot more work before it can be released for general use.

Telling Entity Framework that you will handle migrations

The last thing we need to do is stop EF handling database changes when you change your EF code. If you don’t turn this off then when you change EF database classes EF will block you from running the application until it has updated the database, using whatever EF ‘Database initializers’ is set up (EF’s default initializer is CreateDatabaseIfNotExists) .

To stop EF trying to handle migrations we have to do is provide a null database Initializer. There are two ways of doing this (you only need one):

  1. Call ‘SetInitializer<YourDbContext>(null)’ at startup
  2. Add the following to the <appSettings> part of your web/application config file, e.g.
<appSettings>
    <add key="DatabaseInitializerForType YourApp.YourDbContext, YourApp" value="Disabled" />
</appSettings>

I personally add the appSetting to my config file as that way I know it is done.

Note: See this useful documentation for more on null database Initializers.

Conclusion

Well done for reading this quite long article. In this post I have described how EF keeps its model view in line with the actual database schema. From this we see it does not read the schema other than in one very limited situation of importing an existing database. I also describe what happens if EF’s database model and the actual database are out of step.

I then go on to describe a Unit Test tool that I have developed to check if EF’s database model is the same as the actual SQL model. However you can use the technique in the section ‘Using SQL compare to build the right update scripts’ to create two database and then use some sort of SQL compare tool to get the difference script.

Note: If you are interested in using the CompareEfWithDb and CompareSqlToSql Unit Test methods then let me know. I won’t be able to get a release out of this package for some time as it needs a lot of extra work to support all EF’s features and I am busy working on another project.

Finally I describe how to stop EF from trying to look after changes in the database model, which you need to do if you are taking over the job of handling database migrations.

Happy coding!

Postscript

I am now using this approach on my existing e-commerce project and I needed to add my first change the the database schema. The change was simple, adding two bytes to an existing table, which is non-breaking change to the database. It is worth comparing the new process with the old process.

Previously I was using EF data migrations such a change used to take me 30 minutes+. Having made the change to the data classes I needed to check those changes by updating/recreating the Unit Test database. This I had to do in a different way to the local/azure databases because the DatabaseInitializer was different, which required me to editing out some code, deleting the Unit Test database, and then resorting some code. Also, because the way I updated the Unit Test was different to the local/production databases I needed to check everything again when I undated the local/azure databases .

In comparison using the process laid out in these two articles it took me less than 15 minutes to make the changes on all three databases: Unit Test database, local developer database and Azure. I use the similar method to update all three database, but in the case of the Unit Test I also do a complete delete/setup to ensure there isn’t a problem later. However once I had changed the Unit Test database and ran CompareEfWithDb and CompareSqlToSql I was very sure that the other two updates would work. Much less stressful.

I also found it more natural to define the alter table in T-SQL. Maybe its just my preference, but you do get Intellisence when writing the T-SQL script in Visual Studio.

Nice to see it working for real.

 

Handling Entity Framework database migrations in production – part 1, applying the updates

This is the first article in a series about using Microsoft’s data access technology, Entity Framework (EF) in an application that to be ‘always on’, in my case an ASP.NET MVC e-commerce web site, yet is still being developed so there is a need to update the database schema, i.e. the tables, views, constraints etc. that define how the data is held in the database.

This is a series, and in this first article I deal with the issue of updating database schema in a robust and testable way. The list of articles will be:

However before detail my update method I will start by describing the general problems around a database update.

Why production database updates are so challenging

Databases contain date, and that data is often very valuable to someone. In the e-commerce site I am working on the database holds customer orders which they have paid for and they expect them to be delivered. If we want the business to succeed that data better not be lost or corrupted.

The problem is that as the business grows we are likely to need to change the type/format of data we hold in the database, known as the database schema. Simple changes like adding a column to hold say a comment is normally easy to add, as the new table will work with the existing software. However more complex additions of say a new table which links to existing data, i.e. in SQL terms it has a foreign key, are a lot more difficult to do.

Updates also become especially challenging if you want the existing web site to stay ‘live’ while you change the database, i.e. the old software still works with the new database while you are updating its schema and data.

In my case the application is an e-commerce web site using ASP.NET MVC with EF and is running on Microsoft Azure. In the diagram below we see a typical ‘live’ deployment via Azure’s staging slot.

Azure staging-live diagram

The point about this arrangement is that the staging slot holds the new version of the software, which is accessing the same database as the live site. When the new version checks out the DevOps person swaps the staging-live site. The users should see no interruption to their use of the site.

The challenge is if the new software needs a change to the database schema the database must be updated while the existing version of the software is still running. Also the database must support both the old and the new software until the swap is done. That is quite a challenge.

Note: You can read more about the challenges of deploying and updating databases, and the Simple-Talk web site has a whole section on Database Lifecycle Management.

My requirements for the database deployment process

After a lot of research I came up with a list of requirements a system that deployed a database. I decided the process must be:

  • Testable: I can test any database change before running it on the Production database.
  • Automated: I can automate the whole process so that I can’t get it wrong.
  • Trackable: Each database should have a log of what has been done to its schema.
  • Atomic: The update process must either be completed successful or entirely rolled-back.
  • Recoverable: Each update should automatically make a backup in case the worst happens.
  • EF-compatible: The database changes have to work with Entity Framework.
  • Comprehensive: There must be Full SQL access; I needed to change tables, views, stored procedures, constraints etc.

Why I decided EF’s standard migration feature was not sufficient

I am very familiar with EF’ database migration feature and I have used it in the early stages of developing an application. However, it’s this knowledge that made me decide it wasn’t sufficient for this current project. The primary reason is because it is not very testable, i.e. it is hard to run the update on a test database. A secondary reason is that I had come across problems when applying migrations to a Production system – the update runs as part of the application start-up and feedback of errors is very limited. There is a way to run EF migrations from a command line, which does improve the process, but the problem of testability still remains.

All in all I just didn’t feel comfortable with using EF’s own database migration feature, so I set out to find a better way. After a lot of searching I found a package called DbUp.

Note: I talk more about how EF matches its view of the database with the actual schema of the database in the second article.

Database Migrations the DbUp way

DbUp is an open-source, .NET-based tool for running scripts on a database. DbUp is elegantly simple, and uses an “Apply These Scripts…” approach, i.e. you write a list of scripts, SQL or C# based, with a name that sets the order, e.g. Script001, Script002 etc. You can call DbUp via a Console App or directly from PowerShell. It uses a special table in the database to see which scripts have been run and runs any scripts that are new.

Here is an example of using DbUp from my application:

var upgrader = DeployChanges.To
        .SqlDatabase(dbConnectionString)
        .WithScriptsAndCodeEmbeddedInAssembly(Assembly.GetExecutingAssembly())
        .WithTransaction()
        .LogToConsole()
        .Build();

var result = upgrader.PerformUpgrade();

if (result.Successful)
etc.…

Looks simple, but there are lots of little things in DbUp that shows someone has really thought the process through. I have to admit I found the documentation a little obscure in places, but that is because I was most likely thinking in an EF way. Once I got me head around it I started to appreciate the quality of DbUp.

A snippet from one of my test T-SQL scripts looks like this:

create table $schema$.[DataTop](
	[DataTopId] [int] IDENTITY(1,1) PRIMARY KEY,
	[MyString] [nvarchar](25) NULL,
	[DataSingletonId] [int] NULL,
)
go

create table $schema$.[DataChild](
	[DataChildId] [int] IDENTITY(1,1)PRIMARY KEY,
	[MyInt] [int] NOT NULL,
	[MyString] [varchar](max) NULL,
	[DataTopId] [int] NOT NULL
)
Go
ALTER TABLE $schema$.[DataChild]  
ADD  CONSTRAINT [FK_dbo.DataChild_dbo.DataTop_DataTopId] FOREIGN KEY([DataTopId])
REFERENCES $schema$.[DataTop] ([DataTopId])
ON DELETE CASCADE
GO

A couple of pointers for people that are used to EF Migrations:

  1. DbUp only does forward changes, i.e. there is no DOWN script like in EF migrations. The philosophy of DbUp is that at each stage you are transitioning the database to the new state. So going back is simply another transition.
    Considering that I have only used an EF DOWN script about once in four years then I don’t think this is a problem.
  2. There is no ‘Run Seed Every Time’ approach like EF. When you think about it you run one seed at the start why do you need to run it again? In DbUp if you want to change the data in the seed you just have a new script to update or add to the original seed.

Note: DbUp does have a way of running a script every time if you what to, see DbUp NullJournal.

One down side of using SQL scripts to apply database changes is that, unless the project has a DBA (Database Administrator), the developer will need to learn T-SQL. While I am not a DBA I have written quite a bit of T-SQL in my time. In fact I personally prefer T-SQL for configuring a database as I find using EF’s fluent interface can be quite difficult in some situations.

Conclusion

In this first article I have described the issue I have with EF’s own database migration code and shown and alternative method of applying updates to the schema of a SQL database using DbUp. However, on its own this will not work, as EF will not know you have changed things.

In the second article I will look at a method I have created to make sure that the view of the database schema that EF has does not diverge from the actual database schema.

Happy coding!

GenericServices Masterclass: a deeper look at deleting entries

I have been building a new application using GenericServices and I thought it would be useful to others to reflect on the approaches I have used when working with more complex data. In this second master class article I am looking at deleting entries, i.e. data rows, in a SQL Server database using Entity Framework. The master class first article looked at creating and updating entries and can be found here.

I wrote this article because the more I used GenericServices IDeleteService in real applications I ended up using the more complex DeleteWithRelationships version of the command. This article says why and suggests some ways of tackling things, including not deleting at all!

What is GenericServices??

For those of you who are not familiar with GenericServices it is an open-source .NET library designed to simplify the interface between an ASP.NET MVC application and a database handled by Microsoft’s Entity Framework (EF) data access technology. You can read more about it at https://github.com/JonPSmith/GenericServices where the Readme file has links to two example web sites. The GenericServices’ Wiki also has lots more information about the library.

While this article talks about GenericServices I think anyone who uses EF might find it useful as it deals with the basics behind deleting entities.

An introduction to relationships and deletion

Note: If you already understand the different types of referential relations in a relational database and how they affect deletion then you can skip this part and go to the section with title Deleting with GenericServices bookmark.

On the face of it deleting rows of data, known in EF at entities, seems like a simple action. In practice in relational databases like SQL Server it can have far reaching consequences, some of which might not be obvious at first glance when using EF. Let me give you a simple example to show some different aspects to delete.

Below is a database diagram of the data used in the example application SampleMvcWebApp, which contains:

  1. A selection of Tags, e.g. Programming, My Opinion, Entertainment, etc.  that can be applied to a post.
  2. A list of Posts, where each is a article on some specific topic.
  3. A list of Bloggers, which are the authors of each of the Posts.

tagpostblog

Now this example shows the two most used type of relationships we can have in a relational database. They are:

  • One-to-many (1-*): A Post has one Blogger, and a Blogger can have from zero to many Posts.
  • Many-to-many (*-*). A Post can have zero to many Tags, and Tags may be used in zero to many Posts.

The other type of basic relationship is a one-to-one relationship. This is where one entity/table is connected to another. You don’t see this type of relationship so much as if both ends are required (see below) then they can be combined. However you do see one-on-one relationships where one end is optional.

So the other aspect to the ‘One’ part of a relationship is whether it is required or optional (required: 1, optional: 0..1). An example of a required relationship is that the Post must have a Blogger, as that defines the author. An example of an optional relationship would allowing the Blogger to add an optional ‘more about the author’ entry in another table. The Author can choose to set up that data, or not bother.

How EF models these relationships

EF has multiple ways of setting up relationships and I would refer you to their documentation. Here is my super simple explanation to help you relate the section above to Entity Framework classes:

  • The ‘One’ end of a relationship has a property to hold the key (or multiple properties if a composite key).
    • If the key property(ies) is/are nullable, e.g. int? or string without [Required] attribute, then the relationship is optional.
    • If the key property(ies) is/are not nullable, e.g. int or string with [Required] attribute, then the relationship is required. See the BlogId property in the Post class of SampleMvcWebApp.
  • The ‘Many’ end of a relationship is represented by a collection, normally ICollection<T> where T is the class of the other end of the relationship. See the Posts property in the Blog class of SampleMvcWebApp.
  • Many-to-Many relationships have Collections at each end, see the Tags property in the Post class and the Posts property in the Tag Class of SampleMvcWebApp.
    EF is very clever on many-to-many relationships and automatically creates a new table that links the two classes. See my article Updating a many to many relationship in entity framework for a detailed look at how that works.

How these relationships affect Deletion?

If you delete something in a relational database that has some sort of relationship then it is going to affect the other parts of the relationship.  Sometimes the consequences are so small that they don’t cause a problem. However, especially in one-to-one or one-to-many relationships, the effect of a delete does have consequences that you need to think about. Again, let me give you two examples you can actually try yourself on the SampleMvcWebApp web site.

  1. Delete a Many-to-Many relationship. If you go to the Tags Page of SampleMvcWebApp and delete a Tag then when you look at the list of Posts then you will see that that tag has been removed from all Posts that used it (Now press Reset Blogs Data to get it back).
  2. Delete a One-to-Many. However if we go to the list of Bloggers on SampleMvcWebApp and delete one of the Bloggers then when you look at the list of Posts you will see that all Posts by that author have gone. (Now press Reset Blogs Data to get them all back).

So, what has happened on the second one? Well, in this case the database could not keep the relationship without doing something because the Post’s Blog link wasn’t optional. There were two possibilities: either it  could delete all the Post for that Blogger or it could have refused to carry out the delete.

By default EF sets up what is called ‘cascade deletes‘, which is how SampleMvcWebApp is set up. In this case is what deleted the related Posts for that Blogger. If we turned off cascade deletes then the we would get a ‘DELETE statement conflicted with COLUMN REFERENCE’ (547) and the delete would fail.

The simplified rule is if entity A is involved in a required relationship with entity B then when A is deleted something has to happen: either B is deleted or the delete fails. Which happens depends on how you configure EF.

Bookmark

Deleting with GenericServices

GenericServices has a service called IDeleteService which has two delete methods:

  1. Delete<T>(key1, key2…) (sync and async) which deletes the row with the given key, or keys if it has a composite key, from the EF entity referred to by class T, e.g. Delete<Post>(1) would delete the Post with the key of 1.
  2. DeleteWithRelationships<T>(removeRelationshipsFunc, key1, key2…) (sync and async) which does the the same, but called the removeRelationshipsFunc as part of the delete.

I am not going to detail how to use them as the GenericServices Wiki has a good description.You can also find an example of the use of Delete<T> at line 121 in PostsController and an example of the use of DeleteWithRelationships<T> at line 218 in CustomerController.

The only other thing I would say is that deleting entries with composite keys is straightforward – just supply the keys in the order in which they occur in the database. Note: some programmer don’t like composite keys, but I do find them useful. There are places where composite keys are good at segregating data into groups: the primary key can be the group name, the secondary key is the name of the item itself.


When I first wrote GenericServices I just had a Delete<T> method. I very soon found that wasn’t enough, so I added DeleteWithRelationships<T>. Now I find I am using DeleteWithRelationships 80% of the time.

The rest of the article is about why I use DeleteWithRelationships so much, and a few pointers on alternatives to using Delete at all.

Why I use DeleteWithRelationships so much

I have used DeleteWithRelationships in three situations that I will describe:

  1. To provide better error messages when a delete would fail.
  2. To delete other associated entities that would not be caught be cascade deletes.
  3. To delete associated files etc. outside the relational database.

1. To provide better error messages when a delete would fail

I have many instances where I don’t want cascade deletes to work, so if I deleted I would get a SQL error 547. While GenericServices catches this and provides a fairly helpful error it isn’t that informative. I therefore often (actually, nearly always) use DeleteWithRelationships to provide a better error message. Let me give you an example.

In a web site I was building designers could set up designs with text fields. Each field had a ColourSpec, which is a database entity. I allowed a designer to delete a ColourSpec as long as that colour isn’t used in any of the text fields. If it is used then I output a list of designs where it is used so that the designer can decide if they want to remove those reference and try the delete again.

2. To delete other associated entities that would not be caught be cascade deletes

This happens rarely, but sometimes I have a complex relationship that needs more work. I found one in the AdvertureWorks database that I use in my example application complex.samplemvcwebapp.net. In this the customer address consists of two parts: the CustomerAddress with has a relationship to an Address. Now, if we want to delete one of the customer’s  addresses, say because they have closed that office, we want to try and delete the associated Address part, which isn’t linked by cascade deletes.

By using DeleteWithRelationships I can pick up the associated Address relationship and delete that too. In fact I might need to do a bit more work to check if I can delete it as it might be references in an old order. By calling a method which I can write specifically for this case then I insert special logic into the delete service.

NOTE: because the delete is done by GenericServices and any extra delete done in the DeleteWithRelationships method are all executed in one commit. That means if either part of the delete fails then both are rolled bakc, which is exactly what you need.

3. To delete associated files etc. outside the relational database

In a recent web application the system included image files. I chose to store then in the Azure blob storage, which I have to say works extremely well. The database stored all the information about the image, include a url to access the image from the blob, while the images, which can be very large were stored in a blob table.

Now, when I delete the database entity about the image I will end up with an orphan image in the blob storage. I could have a WebJob that runs in the background to delete orphan images, but that is complicated. What I did do was add code to the DeleteWithRelationships to delete the images as part of the delete process.

There is obviously a danger here. Because the database and the blob are not connected then there is no combined ‘commit’, i.e. I might delete the images and the SQL database delete might then fail. Then my url links don’t point to a valid image. In this case I have made a pragmatic choice: I check that the delete should work by checking all the SQL relationships before I delete the blob images. I could get a failure at the SQL end which would make things go wrong, but other parts of the system are designed to be resilient to not getting an image so I accept that small possibility for a simpler system.  Note that if I fail to delete the images from the blob I don’t stop the SQL delete – I just end up with orphan images.

An alternative to Delete

In working web applications I find it is sometimes better not to delete important entities, especially if I want to keep a history of orders or usage. In these cases, which happen a lot in live systems, I am now adding an enum ‘Status’ to my key entities which has an ‘Archive’ setting. So, instead of deleting something I set the Status to Archive.

All I have to do to make this useful is filter which entities are shown based on this Status flag. That way entries with a Status setting of ‘Archive’ are not shown on the normal displays, but can be seen in audit trails or admin screens.

I needed a Status flag anyway, so there is little extra overhead to providing this on my key entities. Also, in EF 6 you can now set a property to have a index, so the query is fairly quick. You need to think about your application, but maybe this would work for you too.

Note: See the comment below by Anders Baumann, with a link to an article called ‘Don’t Delete – Just Don’t‘ for a well formed argument for not using delete.

Conclusion

Deleting looks so simple, but my experience is that it is often more complicated than you think. Hopefully this article gives you both the background on why and the detail on how to use GenericServices to delete entities. I hope its helpful.

Happy coding!