Analysis of Entity Framework 6 async performance

Last Updated: July 29, 2014 | Created: July 17, 2014
Quick Summary
This post is an appendix to an article I am writing for the Simple-Talk blog site. It contains the detailed results of side-by-side comparisons of sync and async versions of the  Entity Framework (EF) 6 data accesses. This shows that async EF commands are not that much  slower than normal sync commands.

I am writing a article called ‘The .NET 4.5 async/await Commands in Promise and Practice‘ for the Simple-Talk blog. In this I look at the whole area of the async/await commands for tasking.

In putting this article together I ran extensive side-by-side comparisons of normal, synchronous Entity Framework (EF) commands and the new async versions of the same commands. I found some surprising results which I think others might be interested in.

The Simple-Talk article wasn’t the right place to put all this technical detail so I have written this blog as an appendix for those of you that would like to drill-down into the detail. I think you will find the results interesting.

Overview of the test database

tagpostblog
The three classes used in database (click to enlarge).

I am building a .NET library called GenericServices (see previous blog post on this) and as the test database I have a simple blog site with three classes in it: Blog, Post and Tag. The diagram below shows the data classes and their relationships.

Most of the tests are done on the Post class, which is the more complex to update and deleting it does not cause cascade deletes. Also, for update I assume a disconnected update, i.e. as would happen on a web site, where the original Post is read and then specific properties are updated before they are written back to the database.

Test 1. Raw EF performance

The first test results I will share with you is the raw speed of the sync and async version of the EF commands. This was done inside an NUnit test running  inside Visual Studio 2013 on my development PC which has an Intel i7 processor clocked at 3.31GHz. The database is localDb running SQL server 2012. I ran 1000 tests on a database filled with 1000 entres and averaged out the time for each command.

You can see the unit test code here (I used method Perf21NCompareEfSyncAndAsync1000Ok around line 61) and the actual Ef commands here (note: GenericServices is an open-source project).

Let me list the results first and then draw some conclusions from this.

Sync (ms) Async (ms) Diff Notes
List, Direct 2.80 7.80 279% Just reads Post class
List, DTO 16.80 21.00 125% Reads Post, Blog and Tags
Create 10.40 8.80 85% Reads Tags and write Post
Update 15.70 9.70 62% Reads Post, Tags and write Post
Delete 0.90 1.10 122% Simple state update

Analysis of raw EF performance

  1. Async does not have much of an overhead
    I was surprised that the async versions were so fast. Lots of blog posts warn about async being slow, but on the simplest method, which was listing the Post class it was only 5ms slower. That, I think is impressive. However in the unit tests the context it was saving was small (see my simple-talk article to learn more about context) and it also caches the context so one off commands might take longer. I look at single reads later in the real-world section below.
  2. Some of the async commands are faster!?
    You can see that the async version of create and the update are faster (see blue percentages). Why is that? I am not totally sure, but I think it is because there are multiple database accesses (see the notes column) and I think it manages to overlap these. If someone has a better explanation then please let me know.

Test 2. Real-world performance on ASP.NET MVC5 site

Raw figures like that are interesting, but its real-world performance that matters. I have a companion MVC5 web site called SampleMvcWebApp, also open-source, which I try things out on. This has the same Blog, Post, Tag format described at the start, but with only a few entries (default 17). I have this MVC web app on different hosting environments, plus internally:

Try this yourself!
The SampleMvcWebApp MVC5 web site is live and available for you to access and play with. Its address is http://samplemvcwebapp.net/
  1. A low-cost shared site running Windows Server 2012 R2 through a UK company called WebWiz. I would recommend WebWiz (no, they are not paying me) as they seem to support the latest Microsoft environments quickly, (one of the first to support SQL 2012 in the UK) and they are relatively cheap.
  2. A Windows Azure account where I can change the level of hosting performance.
  3. Locally on my development server.

My tests are done using ab, the Apache HTTP server benchmarking tool. This allows me to load the site and also get averaged performance figures. I used the benchmark tool to read the posts list (17 entries) in two modes: a) one user trying 100 times, b) 50 users trying twice, all at the same time. The results are:

Host+ action Sync (ms) SD (ms) Async (ms) SD (ms)
WebWiz
 – List Posts ave(100), one user 90 4 95 9
 – List Posts ave(100) , 50 users 1065 450 1050 450
Azure, standard
 – List Posts ave(100), one user 110 11 120 50
 – List Posts ave(100) , 50 users 1200 500 1200 500

 

Notes:
– The Sync (ms) and Async (ms) columns are the average total time to get one list result from the web site using a synchronous and async EF method respectively.
– The SD (ms) column holds the standard deviation and shows the amount of variability of the results.

Analysis of MVC web site performance

    1. Async listing is slightly slower for one user
      There is a small difference, less than 10%, between the sync and async versions. However this must all be due to the use of async as the data is the same. It amounts to 5 to 10 milliseconds, which is about the same as we saw in the raw EF figures earlier.
    2. Using Async seems to have had no effect on the scalability of the web site.
      You will see that the figures for 50 simultaneous users is almost the same for both sync and async commands. That makes sense as the database is small and the query fairly simple and there is almost no waiting for the database (raw numbers for the database access part on small databases is 2 to 3 milliseconds).Clearly with much more complex data accesses then async would start to pull away in terms of scalability of the web site because the thread would be freed up for longer while the database is doing its work.
    3. Most of the time is spent in the MVC/Http part of the process.
      If you look at the timings for one request captured using Google Chrome’s developer tools you can see the parts of the timeline below:

listpostswebwiz

    This clearly shows what I said in the Simple-Talk article – that optimising/caching the other files, like css and JavsScript files would have far more effect than worrying about whether to use a sync or async data request.

I will end this section by saying it is quite difficult to test scalability on live web sites because they are designed to take a lot of load. You might like to look at Steven Sanderson’s video about threads and scalability which has an excellent demo http://channel9.msdn.com/Events/TechDays/Techdays-2012-the-Netherlands/2287.

Overall Conclusions

EF async data accesses have only a small overhead over the standard, sync commands. That is big news, and contrary to some of the older documentation that is out there. So feel free to use async EF commands in your next application.

Happy coding.

Alpha release of GenericServices

Last Updated: August 15, 2014 | Created: July 3, 2014
Quick Summary
This post introduces my new GenericServices framework designed to lighten the development load of building a  service layer.
This post summarises what GenericServices is about and the motivation behind it. It also provides links to code source and the example web site.

I am pleased to announce the alpha release of my new Open Source project called GenericServices available on GitHub. It will also be available on NuGet when it is has a more stable interface (watch this space).

GenericServices is a .NET class library which helps a developer build a service layer, i.e. a layer that acts as a facard/adapter between your business/data service layers and your User Interface or HTTP service.

It does this by providing standard database CRUD (Create, Read, Update and Delete) methods and a standard way of calling business methods, each with clear extension points. The aim is to cut out the standard boiler-plate code so the user only has to write the data or business specific code.

What application frameworks can GenericServices work with?

GenericServices is designed work as a service layer framework in any .NET application, such as  ASP.NET MVC, Widows Azure Web apps, etc. It assumed a disconnected state model, e.g. a web site or HTTP RESYful service where the read of data prior to update is disconnected from the update of the data.

I have also assumed a horizontal scaling model, e.g. scale by having multiple web instances,  as this is how Azure and most web sites scale. I have therefore not thought about serialisation of objects for vertical scaling, i.e. where each layer of the application are run on a separate server and remote calls are used between layers.

GenericServices uses the following .NET frameworks/systems.

  • It needs .NET 4.5 because it implements all commands in normal and the new async/await Tasking format introduced in .NET 4.5
  • It uses Entity Framework 6 (EF6) for database access, again because it supports async commands.
  • It also makes use of the open source AutoMapper library for transforming data and business classes to/from the user interface oriented DTOs.

What is the motivation behind building GenericServices?

I design and develop fairly complex analysing, modelling and data visualisation web applications (see Spatial Modeller). These require a Domain-Driven Design approach to the data and business objects, while the visualisation needs a comprehensive user interface which can include a Single Page Application (SPA) fed by a REST interface. This means there often a mismatch between the business/data layers classes and the data needed by the user interface and SPA.

My experience is that the Service Layer, plus Data Transfer Objects (DTOs), is the best way to solve mismatch. However I have found that the service layer is often filled with lots of code that is very similar, with just the data being different. Its also pretty boring to write. I therefore researched a number of approaches to handle the standard code and finally came up with a solution using C#’s Generic classes. I have therefore called it GenericServices.

Q: How would I use it? A: Look at example web site!

I have taken the trouble to produce a live example web site. This has examples of all the GenericService commands, with documentation explaining how they work – for example see the Posts Controller code explanation page.

As well as showing the use of GenericService it also contains another companion project of mine; code for executing long-running methods with progress and cancellation controls in ASP.NET MVC using SignalR.

This web site is an ASP.NET MVC5 application and is itself an open source project called SampleMvcWebApp. The source of is available on GitHub.

Feedback

While in alpha phase I suggest you leave comments here of contact me via this web site’s Contact Page. Once it is up on NuGet I will try and set up a Stack Overflow group and monitor that.

Mocking SignalR Client for better Unit Testing

Last Updated: January 22, 2015 | Created: June 10, 2014
Modal dialog showing task progress.
Modal dialog showing task progress. Uses SignalR.

Why I needed to Mock SignalR

I build geographic modelling applications and they have lots of tasks that take a long time, sometime many minutes, to run. I am currently  developing an open source framework to help speed up the development of such ASP.NET MVC applications. Therefore part of the framework I have includes modules for handling long running processes, with a progress bar, messages and user cancellation. Click on the image on the left to see a a simple example of a model panel with a green progress bar at the top and a list of sent messages as the task works its way through the calculations (well, in this case a test code so the messages aren’t that exciting) .

I have used SignalR for the communication channel between the JavaScript and the MVC5 server. I have found SignalR to be excellent and makes two-way comms really easy.

However my application is fairly complicated because of all the things that can happen, like errors, user wanting to cancel, losing connection, etc. In particular the JavaScript client uses a state machine to handle all the options, and that needs checking. For this reason I wanted to unit test both ends. (Read my blog on Unit Testing for an in-depth look at how I use Unit Testing).

The C# end was fairly straight forward to test, as it was partitioned well. However for the JavaScript end I needed to Mock the SignalR JavaScript library. I could not find anything online so I wrote something myself.

Mocking the SignalR JavaScript Client

I turns out that is wasn’t that hard to mock the SignalR Client, although I should say I don’t use the autogenerated SignalR hub scripts, but use the createHubProxy(‘HubName’) as I find that easier to manage that loading a dynamically created script. I have listed the code mocked SignalR Client code below:

//This is a mock for a small part of SignalR's javascript client.
//This code does not mock autogenerated SignalR hub scripts as the
//ActionRunner code uses the connection.createHubProxy('HubName') method,
//followed by .on or .invoke to setup the receive and send methods respectively

var mockSignalRClient = (function ($) {

    var mock = {};

    //first the items used by unit testing to see what has happened
    mock.callLog = null;
    mock.onFunctionDict = null;
    mock.doneFunc = null;
    mock.failFunc = null;
    mock.errorFunc = null;
    //This logs a string with the caller's function name and the parameters
    //you must provide the function name, but it finds the function arguments itself
    mock.logStep = function (funcName) {
        var log = funcName + '(';
        var callerArgs = arguments.callee.caller.arguments;
        for (var i = 0; i < callerArgs.length; i++) {
            log += (typeof callerArgs[i] === 'function') ? 'function, ' : callerArgs[i] + ', ';
        };
        if (callerArgs.length > 0)
            log = log.substr(0, log.length - 2);
        mock.callLog.push(log + ')');
    };
    mock.reset = function() {
        mock.callLog = [];
        mock.onFunctionDict = {}
        mock.doneFunc = null;
        mock.failFunc = null;
        mock.errorFunc = null;
    };

    //doneFail is the object returned by connection.start()
    var doneFail = {};
    doneFail.done = function (startFunc) {
        mock.logStep('connection.start.done');
        mock.doneFunc = startFunc;
        return doneFail;
    };
    doneFail.fail = function(failFunc) {
        mock.logStep('connection.start.fail');
        mock.failFunc = failFunc;
        return doneFail;
    };

    //Channel is the object returned by connection.createHubProxy
    var channel = {};
    channel.on = function (namedMessage, functionToCall) {
        mock.logStep('channel.on');
        mock.onFunctionDict[namedMessage] = functionToCall;
    };
    channel.invoke = function (actionName, actionGuid) {
        mock.logStep('channel.invoke');
    };

    //connection is the object returned by $.hubConnection
    var connection = {};
    connection.createHubProxy = function (hubName) {
        mock.logStep('connection.createHubProxy');
        return channel;
    };
    connection.error = function (errorFunc) {
        mock.logStep('connection.error');
        mock.errorFunc = errorFunc;
    };
    connection.start = function () {
        mock.logStep('connection.start');
        return doneFail;
    };
    connection.stop = function () {
        mock.logStep('connection.stop');
        return doneFail;
    };

    //now we run once the method to add the hubConnection function to jQuery
    $.hubConnection = function() {
        return connection;
    };

    //Return the mock base which has all the error feedback information in it
    return mock;

}(window.jQuery));

I think you will find most of this fairly easy to understand. Lines 8 to 34 are all the variables and methods for Unit Testing to use. The rest of the code implements the methods which mock the SignalR methods I use in my code.

How did I use this Mock SignalR?

SignalR works by adding .hubConnection() to jQuery so it was simple to make the mock SignalR client do the same (see line 71 above). My actual code checks that jQuery is present and then that $.hubConnection is defined, which ensures SignalR is loaded. Here is a piece of code from my ActionRunner.comms.js that does the initial setup to see how it uses SignalR and therefore what I needed to Mock.

//This deals with setting up the SignalR connections and events
function setupTaskChannel() {

    actionRunner.setActionState(actionStates.connectingTransient);

    actionRunner.numErrorMessages = 0;

    //Setup connection and actionChannel with the functions to call
    var connection = $.hubConnection();

    //connection.logging = true;
    actionChannel = connection.createHubProxy('ActionHub');
    setupTaskFunctions();

    //Now make sure connection errors are handled
    connection.error(function(error) {
        actionRunner.setActionState(actionStates.failedLink);
        actionRunner.reportSystemError('SignalR error: ' + error);
    });
    //and start the connection and send the start message
    connection.start()
        .done(function() {
            startAction();
        })
        .fail(function(error) {
            actionRunner.setActionState(actionStates.failedConnecting);
            actionRunner.reportSystemError('SignalR connection error: ' + error);
        });
}
Jasmine Unit Test checking what SignalR functions were called
Jasmine Unit Test checking what SignalR functions were called

Using this mocking framework

There are two main ways I use it. Firstly you get a log of each method called, which helps ensure the right methods are called.

Secondly most of the calls to SignalR link functions to certain SignalR events. By capturing these functions the unit test can call them to simulate SignalR messages, errors etc. That allows a very good level of checking.

Getting the whole picture

In case you are interested in downloading the code or seeing how it was used then here are a series of links to various bits of code. These are taken form an open source project that I am currently working on, so the code is subject to change. I have listed all the various parts of the testing. UPDATE: These links had broken because the git repository had changed – sorry about that. Now fixed.

Conclusion

As I said in my previous post ‘Unit Testing in C# and JavaScript‘ I find mocking in JavaScript very easy and helpful. The ActionRunner is complex enough to need unit testing and I found mocking the various parts I wanted to replace fairly quick to implement.

I hope this helps you with SignalR and encourages you to mock other frameworks to help you test more easily. Happy coding.

 

 

 

Reflections on Unit Testing in C# and JavaScript

Last Updated: June 8, 2014 | Created: May 31, 2014

It has been a while since I wrote my first blog post called ‘Why I’m not scared of JavaScript any more‘ and since then I have written a fair amount of JavaScript. One of the things that I have found interesting is how I use different approaches to Unit Testing in each language. I’m not saying I have the ‘best’ way of Unit Testing each language, in fact I think some areas are a bit weak, but maybe you will find my experiences and reflections useful.

If you want to get to the meat of the post skim the next three paragraphs that set the scene and jump to ‘The differences in how I Unit Test between the two languages’ section.


First, respect to Gilles Ruppert…

Before I start I should say that I was heavily helped on the JavaScript side by an excellent programmer called Gilles Ruppert. As I was new to JavaScript I sort help and my son recommended Gilles. Gilles worked for me for about 10 weeks and set up the architecture of the single page application that I needed. He also set up an excellent JavaScript development environment using Karma test runner, Mocha test framework, Expect assertions and finally Sinon stub/spy/mocks. I am building on the shoulders of a giant.

Setting the scene

have just finished the first release of Spatial Modeller™ , a medium sized ASP.NET MVC web application using ASP.NET MVC4 written in C# and a large single page application written in JavaScript using backbone with marionette.

When Gilles finished we had just under 300 JavaScript Unit Tests, a handful written by me. Now I have added a lot more code and written an additional 400+ Unit Tests, taking the JavaScript tests to over 700.

On the C# side I use NUnit unit testing framework with a small amount of Moq for mocking etc. I use Resharper for running the testing inside Visual Studio. I have used this combination for some years.

Numerically I have less Unit Tests than JavaScript, currently C# has just under 500. However each test is often much more complicated and test many items per test. The reasons for this difference in approach is one of the reasons I wrote this blog, so read on to find out why.

My style of Unit Testing

I really believe in Unit Testing and I don’t think I have a project that doesn’t used Unit Testing. However I am not a fan of Test Driven Development (TDD) as I have found that way I don’t come up with coherent design. Writing Unit Tests is not an excuse for not doing some good designing of the system first.


The differences in how I Unit Test between the two languages

After maybe 10 months of JavaScript Unit Testing (and 5+ years of C# Unit Testing) I find it really interesting that I use different styles between the two. Here are the raw  differences and I’ll make some conclusions at the end.

1. In C# my first Unit Test in the compiler

Because C# is typed the compiler gives me lots of feedback. Therefore I my refactor code before I have even run it if I see opportunities to improve the code. This doesn’t happen with JavaScript, partly because JavaScript is not typed in the same way so tools like JSHint cannot give such comprehensive feedback.

(hint: In Visual Studio you can install Web Essentials which will then run JSHint and JSCS when a JavaScript file is closed. It may not be as good and the C# compiler at spotting things, but it can help spot silly mistakes.)

This means I am willing to write more C# code before I Unit Test than I would with JavaScript.

2. Large JavaScript is much easier to partition than C#

The ‘up’ side of not so tight checking in JavaScript is much easier to build in smaller chunks. The ‘interface’ between these parts is often an object which, by the nature of JavaScript, is not directly linked between modules. As well as making Mocking really easy it seems to help me think about each part separately.

In C# I use interfaces and layer to separate chunks of code, but for something complex  with five+ significant classes that I tend to think of them as a whole. The typed nature makes them more ‘linked’ than the JavaScript equivalent and mocking is harder.

3. The unit testing frameworks make a BIG difference

I have found that test frameworks like Mocha and Jasmine have some featured that encourage small tests. This is because these frameworks support ‘nested setups’, which other frameworks like NUnit don’t have. Let me explain what ‘nested setups’ are and why they encourage smaller, one item tests.

For most tests we need to setup some instances or data before we run the test. In NUnit we can run a setup method a) once at the start of the test class and/or b) once before each test is run. This is fine, but I find that I often need to run something specific before each test, which in NUnit you to at the beginning of the specific test. See example below where I run a ‘once at start’ method called ‘SetUpFixture’ (lines 1 to 6) and then an additional setup phase inside the specific test ‘CheckValidateTagError’ (line 14).

[TestFixtureSetUp]
public void SetUpFixture()
{
    using (var db = new SampleWebAppDb())
        DataLayerInitialise.ResetDatabaseToTestData(db, TestDataSelection.Simple);
}

[Test]
public void CheckValidateTagError()
{
    using (var db = new SampleWebAppDb())
    {
        //SETUP
        var existingTag = db.Tags.First();

        //ATTEMPT
        var dupTag = new Tag {Name = "duplicate slug", Slug = existingTag.Slug};
        db.Tags.Add(dupTag);
        var status = db.SaveChangesWithValidation();

        //VERIFY
        status.IsValid.ShouldEqual(false);
        status.Errors.Count.ShouldEqual(1);
        status.Errors[0].ErrorMessage.ShouldEqual("The Slug on tag 'duplicate slug' must be unique.");
    }
}

This fine, but I find that the individual setups can become quite long. I then have three choices if I have multiple things to check: a) duplicate the individual setup code in each test (Not good for DRY), b) write a small helper method which encapsulates the setup code (takes time and hides the setup), or c) test multiple items in the one test, which is what I have done above.

Both the Mocha and Jasmine JavaScript testing frameworks allow nesting of the setups, so in my last example I could have an outer group with the ‘SetUpFixture’ in it and then a small nested group, with a setup just for CheckValidateTagError, and then three separate tests for the three parts.

Here is an example from my actual code with some nested test groups, which Mocha does with the ‘describe’ command:

... various require code

describe('views/InfoPanel', function () {
    afterEach(function () {
        delete this.model;
        delete this.view;
    });

    describe('single item to display, no grades', function () {
        beforeEach(function () {
            this.layerInfo = helpers.loadFixture('/api/schemes/1').layerInfos[0];
            this.members = helpers.loadFixture('/api/layers/1/members');
            this.model = new InfoPanelModel({ selectedMembers: this.Members });
            this.view = new InfoPanelView({ model: this.model });
        });
        afterEach(function () {
            delete this.layerInfo;
            delete this.layer;
            delete this.members;
        });

        it('should be a Backbone.View', function () {
            expect(this.view).to.be.a(Backbone.View);
        });

        describe('after showing view', function () {
            beforeEach(function () {
                this.mainRegion.show(this.view);
            });

            it('should have something on view', function () {
                expect(this.view.$el).to.have.length(1);
            });

            it('title in window header should start with layer name', function () {
                ... etc.
            });

You can see the first setup, called ‘beforeEach’ on lines 10 to 15 and the the nested ‘beforeEach’ on lines 27 to 29. This works really well and is the reason why my JavaScript Unit Tests test almost always check one item per test. I really like Mocha’s approach and miss it in NUnit.

4. Mocking databases is hard!

One thing that definitely affects me is that the database side is hard to mock. I have used a number of ORMs, and direct T-SQL, and there isn’t a nice way to replace the real code with something for testing that will catch the same errors. Foreign keys, relational fixed etc. etc. is realy hard to mock (just have a look at my post Updating a many to many relationship in entity framework to see the complications that arise).

This means some of by Unit Tests in C# are more integration tests as I use the real database layer, but with a test database loaded with test data. There is a cost to that, but my experience is anything else lets errors through.

JavaScript using backbone.js does access data, but in a more controlled way and Gilles set up the classic ‘fixtures’ and filled it with test data created by the C# application. That makes JavaScript testing much easier, but only because the server is dealing with the complex parts.

5. PS: You must be able to debug inside Unit Tests

It goes without saying that Unit Tests are for finding bugs before you use the code in the real application. Therefore you MUST be able to debug code, with breakpoints and variable inspection etc., when you are running a unit test.

I mention this only because Resharper, which I find really useful, is very bad in this area when unit testing JavaScript. Resharper makes running unit tests really easy, especially on JavaScript using QUnit or Jasmine. However I have found it almost impossible to use it to debug Jasmine tests, and I assume QUnit tests, when running in the Resharper harness.

However the Mocha JavaScript test environment that Gilles set up is excellent on that score. It allows running with connected browser, use of debugger; statement to cause pauses when the debugger is open and the .only() statement to allow only certain tests to be run.

How this affects my Unit Testing

  1. My JavaScript Unit Tests are shorter and just test one item per test.
  2. I write new Unit Test groups more often in JavaScript than in C#.
  3. My C# Unit Tests are often more complex and often test multiple items per test. This is because:
    1. The setup code is complicated or time consuming.
    2. NUnit does not have nested setups like Mocha does
      (see The unit testing frameworks make a BIG difference above).
  4. I often write more C# code before writing Unit Tests as the the compiler catches a lot of ‘silly’ mistakes in C# which can get through in JavaScript.
  5. C# sees more ‘linked together’ than JavaScript, which affects mocking and designing.

I know that testing multiple items in one test, point 3 above, is often seen as ‘wrong’ by some, but cost of both code to set it up and time to run the test are all factors I have weighted up in coming to my current style.

I have more C# code than JavaScript code by maybe 3 to 1. My 500 C# Unit Tests take nearly 2 minutes to run while my 700 JavaScript Unit Tests take 30 seconds.

Conclusion

I like both C# and JavaScript. They each have their strengths and weaknesses. I also could not work a project without the feedback that Unit Tests give me on the quality of the software.

With all of this in mind it is worth thinking about how you Unit Test in each language. The language, the need, the project and the environment will change the way Unit Tests can and should be approached. Happy programming.

Updating a many to many relationship in entity framework

Last Updated: March 2, 2017 | Created: May 22, 2014
Quick Summary
I found conflicting information on how to update many-to-many relationships in entity framework. I therefore decided to research the problem properly. This post shares with you my research.
I believe I now understand how entity framework works with automatic many-to-many relationships and how to implement my own linking table if I need to.

I had problems working out the best way to update a many to many relationship in Entity Framework (referred to as EF). I ended up doing some research and I thought others might like to see what I found out.

Firstly the environment I am working in:

  • I am using EF 6.1 in Code First mode. I am pretty sure what I have found applies across the board, but I have only tested on EF 6.1
  • I am working in a web application (MVC 5) so I have to handle disconnected classes.

If you just want the quick answer then goto the Part 1 Conclusion. The Conclusions also have a link to a live web site where you can see a many-to-many relationship used for real (with access to the source too). However its worth reading some of the examples so you understand why it works.

UPDATE 2017: I have written a new version of this article for Entity Framework Core (EF Core) to celebrate the release of my book Entity Framework Core in Action.

EF Core does NOT work the same way as EF6.x and the new article provides information on how to handle many-to-many relationships in EF Core.


Part 1: Using EF’s automatic linking of many-to-many relationships

In this example I have simply linked that tag and the post by having ICollection<Tag> Tags in the Post class and ICollection<Post> Posts in the Tag class. See navigation properties in Tag and Post in diagram below:

Tag, Post and Blog entity framework classes (blog dimmed as not important)
Tag, Post and Blog entity framework classes

As explained in the Microsoft tutorial EF will create a linking table called TagPosts or PostTags. This link table, which you never see, provides the many to many linking. This way a post can have none to many tags and tags can be in none to many posts.

So the action I want to perform is to change the tags on a post. Sounds simple, and it is in this example if you watch out for a few things.

First answer: update many-to-many relationship in connected state

Below are two unit Unit Tests which doesn’t need to worry about disconnected classes. These are good starting points as its important to know how EF does the simplest case. The first test adds a tag to a post while the second one replaces the current tags with a new set. Note that the first post starts with two tags; ‘good’ and ‘ugly’.

[Test]
public void Check25UpdatePostToAddTagOk()
{
    using (var db = new SampleWebAppDb())
    {
        //SETUP
        var badTag = db.Tags.Single(x => x.Slug == "bad");
        var firstPost = db.Posts.First();

        //ATTEMPT
        db.Entry(firstPost).Collection( x => x.Tags).Load();
        firstPost.Tags.Add( badTag);
        db.SaveChanges();

        //VERIFY
        firstPost = db.Blogs.Include(x => x.Posts.Select(y => y.Tags)).First()
                            .Posts.First();
        firstPost.Tags.Count.ShouldEqual(3);
    }
}

[Test]
public void Check26ReplaceTagsOk()
{
    using (var db = new SampleWebAppDb())
    {
        //SETUP
        var firstPost = db.Posts.First();
        var tagsNotInFirstPostTracked =
             db.Tags.Where(x => x.Posts.All(y => y.PostId != firstPost.PostId))
                    .ToList();

        //ATTEMPT
        db.Entry(firstPost).Collection(x => x.Tags).Load();
        firstPost.Tags = tagsNotInFirstPostTracked;
        db.SaveChanges();

        //VERIFY
        firstPost = db.Blogs.Include(x => x.Posts.Select(y => y.Tags)).First()
                            .Posts.First();
        firstPost.Tags.Count.ShouldEqual(1);
    }
}

Now, you should see the two important lines (line 11 and 34) which loads the current Tags in the post. This is really important as it loads the post’s current tags so that EF can track when they are changed. There are a number of alternatives to using .Load(), that would work.

  1. Add the appropriate .Include() when loading the posts, e.g.
    var firstPost = db.Posts.Include( post => post.Tags).First()
  2. Make the Tags  property in the Post class virtual, e.g.
    public virtual Collection<Tag> Tags { get; set; }

Having used .Load(), .Include() or a virtual property then EF tracks the data and then does all the work to remove the TagLinks rows. This is very clever and very useful.

I wanted to really prove to myself that my findings were correct out so I wrote another Unit Test to test the failure case. The unit test below shows conclusively that if you don’t load the current Tags it gets the wrong result. As I said earlier the first post started with the ‘good’ and ‘ugly’ tags and should have ended up with ONLY the ‘bad’ tag. However the Unit Test shows it ended up with all three.

[Test]
public void Check05ReplaceAllTagsNoIncludeBad()
{
    using (var db = new MyDbContext())
    {
        //SETUP
        var snap = new DbSnapShot(db);
        var firstPost = db.Posts.First();
        var badTag = db.Tags.SingleOrDefault(x => x.Slug == "bad");

        //ATTEMPT
        firstPost.Tags = new List { badTag };
        db.SaveChanges();

        //VERIFY
        snap.CheckSnapShot(db, 0, 1);
        var readPost = db.Posts.Include(x => x.Tags)
                         .SingleOrDefault(x => x.PostId == firstPost.PostId);
        CollectionAssert.AreEquivalent(new[] { "good", "bad", "ugly" },
                                       readPost.Tags.Select(x => x.Slug));
    }
}

As you can see from the above example it ended up with all three, which is not the right answer.

Second answer: update many-to-many relationship in disconnected state

When working with a web application like MVC an update is done in two stages. Firstly the current data is sent to the user who updates it. Secondly when the user presses submit the new data is sent back, but its now disconnected from the database, i.e. EF is not tracking it. This makes for a slightly more complicated case, but still fairly easy to handle.

Below is my method for updating the posts. In this case I have filled a MultiSelectList with all the tags and it returns the ids of the tags that the user has chosen. I should also point out I use the same method for create and update, hence the test on line 5 to see if I need to load the current tags collection.

private void ChangeTagsBasedOnMultiSelectList(SampleWebAppDb db, Post post)
{
   var requiredTagIds = UserChosenTags.GetFinalSelectionAsInts();

   if (post.PostId != 0)
       //This is an update so we need to load the tags
       db.Entry(post).Collection(p => p.Tags).Load();

   var newTagsForPost = db.Tags
                          .Where(x => requiredTagIds.Contains(x.TagId)).ToList();
   post.Tags = newTagsForPost;
}

The important thing is that I loaded new tags from the database so they are tracked.

Conclusion from part 1

If you want to update an EF provided many-to-many link then:

  1. Pick one end of a many-to-many relationship to update. EF will sort out the other end for you.
  2. Make sure that the collection you will change is loaded, either by putting virtual on the property, using .Include() in the initial load or using .Load() later to get the collection.
    Note: .Include() is the best performing of the three as it means the data is loaded in the initial SQL command. The other two, .Load() and virtual, require a second SQL access.
  3. Make sure the any new items, e.g. Tags in my example, are loaded as a tracked entity.
    This is normal case, but in the disconnected state, i.e. in a web app, that might not be true (see examples above).
  4. Call EF’s .SaveChanges() to save the changes.

Because you have tracked entities then EF’s change tracking will notice it and sort out the adding or removing of rows in the hidden link table for you.

Live web site with many-to-many relationship update

As part of my GenericServices open-source project I have build two example web sites. One of them has a simple list of blog Posts, each of which has one or many Tags. The Posts and Tags are linked by a many-to-many table. This web site is live and you can try it yourself.

  • Edit a post at http://samplemvcwebapp.net/Posts and change the tags – Note: if accessed from desktop then you need to use control-click to select multiple tags.
  • You can also see the code that updates this many-to-many relationship via the open-source project SampleMvcWebApp – see the code right at the end of the DetailPostDto.cs class in the method ChangeTagsBasedOnMultiSelectList.

Part 2: Using your own link table for many-to-many relationships

There are cases when you want to have your own link table, possibly because you want to include some other property in the link table. Below is a case where I have simply created my own link table with no extra properties. As you can see the PostTagLink table has a row for every Tag / Post link just like the hidden table that EF produced in the first instance. However now that we produced this we need to keep it up to date.

Tag, TagPostLink and Post classes
Tag, PostTagLink and Post classes

So let us carry out the same type of Unit Test (connected) code and the MVC (disconnected) code we did in the first Example.

First answer: update many-to-many relationship in connected state

The two unit tests below now show that we need to manipulate our PostTagLink table entries, not the navigation properties in the Post. After we have saved the changes the Post’s AllocatedTags list will reflect the changes through EF’s relational fixup done on any tracked classes.

[Test]
public void Check25UpdatePostToAddTagOk()
{
    using (var db = new SampleWebAppDb())
    {
        //SETUP
        var badTag = db.Tags.Single(x => x.Slug == "bad");
        var firstPost = db.Posts.First();

        //ATTEMPT
        db.PostTagLinks.Add(new PostTagLink { InPost = firstPost, HasTag = badTag });
        db.SaveChanges();

        //VERIFY
        firstPost = db.Posts.Include(x => x.AllocatedTags).First();
        firstPost.AllocatedTags.Count.ShouldEqual(3);
    }
}

[Test]
public void Check26UpdatePostToRemoveTagOk()
{
    using (var db = new SampleWebAppDb())
    {
        //SETUP
        var firstPost = db.Posts.First();
        var postTagLinksToRemove =
             db.PostTagLinks.First(x => x.PostId == firstPost.PostId);

        //ATTEMPT
        db.PostTagLinks.Remove(postTagLinksToRemove);
        db.SaveChanges();

        //VERIFY
        firstPost = db.Posts.Include(x => x.AllocatedTags).First();
        firstPost.AllocatedTags.Count.ShouldEqual(1);
    }
}

I think the code speaks for itself, i.e. you add or remove rows from the PostTagLinks table to change the links.

Second answer: update many-to-many relationship in disconnected state

Just like the first example when using MVC I have filled a MultiSelectList with all the tags and it returns the ids of the tags that the user has chosen. So now I need to add/remove rows from the PostTagLinks table. However I do try not to change links that are still needed, hence I produce tagLinksToDelete and tagLinksToAdd as its more efficient.

private void ChangeTagsBasedOnMultiSelectList(SampleWebAppDb db, Post post)
{
    var requiredTagIds = UserChosenTags.GetFinalSelectionAsInts();

    var tagLinksToDelete =
        db.PostTagLinks.Where(x => !requiredTagIds.Contains(x.TagId) && x.PostId == PostId).ToList();
    var tagLinksToAdd = requiredTagIds
        .Where(x => !db.PostTagLinks.Any(y => y.TagId == x && y.PostId == PostId))
        .Select(z => new PostTagLink {InPost = post, HasTag = db.Tags.Find(z)}).ToList();

    //We get the PostTagLinks entries right, which is what EF needs
    tagLinksToDelete.ForEach(x => db.PostTagLinks.Remove(x));
    tagLinksToAdd.ForEach(x => db.PostTagLinks.Add(x));
    //********************************************************************
    //If using EF 6 you could use the more efficent RemoveRange, e.g.
    //db.PostTagLinks.RemoveRange(tagLinksToDelete);
    //db.PostTagLinks.AddRange(tagLinksToAdd);
    //********************************************************************
}

Conclusion from part 2

If you have your own link table for handling many-to-many relationships you need to

  1. Add or remove entries from you link table, in my case called PostTagLinks.
  2. Make sure the any new items, e.g. Tag, added to your link table is loaded as a tracked entity.
  3. Call EF’s .SaveChanges() to persist the changes.

Well, that took longer than I expect to write the blog, but I hope it helps others in really understanding what is going on underneath EF’s many-to-many relationships. Certainly I now feel much more confident on the topic.

Additional note: You will see I use EF commands directly and do not use a repository or UnitOfWork pattern when accessing the database. You might like to read my post on ‘Is the Repository pattern useful with Entity Framework?‘ as to why I do that.

Is the Repository pattern useful with Entity Framework?

Last Updated: February 21, 2018 | Created: May 10, 2014
Quick Summary
This is, hopefully, a critical review of whether the repository and UnitOfWork pattern is still useful with the modern implementations of Microsoft’s Entity Framework ORM. I look at the reasons why people are suggesting the repository pattern is not useful and compare this with my own experience of using the repository and UnitOfWork over many years.

This is a series:

  1. Part 1: Analysing whether Repository pattern useful with Entity Framework (this article).
  2. Part 2: Four months on – my solution to replacing the Repository pattern.
  3. UPDATE (2018): Big re-write to take into account Entity Framework Core, and further learning.

I have just finished the first release of Spatial Modeller™ , a medium sized ASP.NET MVC web application. I am now undertaking a critical review of its design and implementation to see what I could improve in V2. The design of Spatial Modeller™ is a fairly standard four layer architecture.

Four layer design drawing
Software design of Spatial Modeller Web app

I have use the repository pattern and UnitOfWork pattern over many years, even back in the ADO.NET days. In Spatial Modeller™ I think I have perfected the design and use of these patterns and I am quite pleased with how it helps the overall design.

However we don’t learn unless we hear from others that have a counter view, so let me try and summarise the arguments I have seen against the repository/Unit of Work pattern.

What people are saying against the repository pattern

In researching as part of my review of the current Spatial Modeller™ design I found some blog posts that make a compelling case for ditching the repository. The most cogent and well thought-out post of this kind is ‘Repositories On Top UnitOfWork Are Not a Good Idea’. Rob Conery’s main point is that the repository & UnitOfWork just duplicates what Entity Framework (EF) DbContext give you anyway, so why hide a perfectly good framework behind a façade that adds no value. What Rob calls ‘this over-abstraction silliness’.

Another blog is ‘Why Entity Framework renders the Repository pattern obsolete’. In this Isaac Abraham adds that repository doesn’t make testing any easier, which is one thing it was supposed to do.

I should also say I found another blog, ‘Say No to the Repository Pattern in your DAL’ which, says that using a repository removes access to Linq querying, ability to include/prefetch or aync support (EF 6). However this not true if your repository passes IQueryable items as this allows all these features.

So, are they right?

Reviewing my use of repository and UnitOfWork patterns

I have been using repositories for a while now and my reflection is that when the ORMs weren’t so good then I really needed repositories.

I build a geographic modelling application for a project to improve HIV/AIDS testing in South Africa. I used LINQ SQL, but because it didn’t support spatial parts I had to write a lot of T-SQL stored procedures and use ADO.NET to code access the spatial values. The database code was therefore a pretty complicated mix of technologies and a repository pattern acted as a good Façade to make it look seamless. I definitely think the software design was helped by the repository pattern.

However EF has come a long way since then. Spatial Modeller™ used EF 5, which was the first version to support spatial types (and enums, which is nice). I used the repository pattern and they have worked well, but I don’t think it added as much value as in the South Africa project as the EF code was pretty clean. Therefore I sympathise with the sentiments of Rob etc.

My views on the pros and cons of repository and UnitOfWork patterns

Let me try and review the pros/cons of the repository/UnitOfWork patterns in as even-handed way as I can. Here are my views.

Benefits of repository and UnitOfWork patterns (with best first)

  1. Domain-Driven Design: I find well designed repositories can present a much more domain specific view of the database. Commands like GetAllMembersInLayer make a lot more sense than a complex linq command. I think this is a significant advantage. However Rob Conery’s post suggests another solution using Command/Query Objects and references an excellent post by Jimmy Bogard called ‘Favor query objects over repositories’.
  2. Aggregation: I have some quite complex geographic data consisting of six or seven closely interrelated classes. These are treated as one entity, with one repository accessing them as a group. This hides complexity and I really like this.
  3. Security of data: one problem with EF is if you load data normally and then change it by accident it will update the database on the next SaveChanges. My repositories have are very specific what it returns, with GetTracked and GetUntracked versions of commands. And for things like audit trails I only ever return them as untracked. (see box below right if you not clear on what EF tracking is).
  4. Hiding complex T- SQL commands: As I described before sometimes database accesses need a more sophisticated command that needs T-SQL. My view these should be only in one place to help maintenance. Again I should point out that Rob Conery’s post Command/Query Objects (see item 1) could also handle this.
Not sure what EF Tracked is?
A normal EF database command using DbContext returns ‘attached’ classes. When SaveChanges() is called it checks these attached classes and writes any changed data back to the database. You need to append AsNoTracking() to the linq command if you want a read-only copy. Untracked classes are also slightly faster to load.

Non-benefits of repository and UnitOfWork patterns are:

  1. More code: Repositories and UnitOfWork is more code that needs developing, maintaining and testing.
  2. Testing: I totally agree with Isaac Abraham. Repositories are no easier to mock than IDbSet. I have tried on many occasions to mock the database, but with complex, interrelated classes it is really hard to get right. I now use EF and a test database in my unit tests for most of the complex models. It’s a bit slower but all my mocks could not handle the relational fixup that EF does.

Conclusion

I have to say I think Command/Query Objects mentioned by Rob Conery and described in detail in Jimmy Bogard’s post called ‘Favor query objects over repositories’ have a lot going for them. While the repository/UnitOfWork pattern has served me well maybe EF has progressed enough to use it directly, with the help of Command/Query Objects to make access more Domain-Driven in nature.

What I do in a case like this is build a small app to try out a new approach. I ensure that the app has a few complex things that I know have proved a problem in the past, but small enough to not take too long. This fits in well with me reviewing the design of Spatial Modeller™ before starting on version 2. I already have a small app looking at using t4 code generation to produce some of the boiler plate code and I will extend the app to try a non-repository approach. Should be interesting.

UPDATE – 4 months & 8 months later
Read the second part of this series after four months of development and my conclusions on how to replace the Repository/Unit of Work pattern with new ways of working.  Quite a journey, but I think a useful development. Have a read and see what you think. PS. There is a live example web site showing some of the approaches which you can play with and access the code.
Happy coding!

Why I’m not scared of JavaScript any more

Last Updated: May 2, 2014 | Created: April 10, 2013

My recent programming experience has been in C# and .NET, mainly on windows apps to support my business. However I needed to move to a web-based application, especially for our work in Africa.

The problem was the only bits of JavaScript I had seen up to this point was a few lines to say handle a button click. It looked very unstructured and, for a person used to OOP and C#,  rather nasty. So, what do I do in these cases… I bought a book, or in this case I bought two books. The effect was magic.

javascript design patterns bookBook 1: Learning JavaScript Design Patterns

The first book I read was Learning JavaScript Design Patterns by Addy Osmani. I bought this because a) it was on design patterns and b) it was very recent. The last is important because JavaScript is developing at such a fast pace.

This book was just what I needed. I opened my eyes to how to Construct JavaScript cleanly and provide good structure to my programming. It made me much more confident of how to write well structured JavaScript.

I really recommend this book to anyone who is a programmer but doesn’t really know about the new ways of  building JavaScript programs.

effective javascript bookBook2: Effective JavaScript

However I knew that a book on design patterns wouldn’t tell me the nitty gritty of the language. I already had the book JavaScript: The Good Parts by Douglas Crockford, but that hadn’t helped me to get inside the thinking of JavaScript.

I found a recent book called Effective JavaScript by David Herman. I have two Effective C# books and they really get into the depths of C#, so I got the JavaScript version. I was not disappointed.

Where the Design Pattern book gave me the overview the Effective JavaScript book took me deep into the inner workings of JavaScript. Also, from skim reading it to get a good idea of what to do and, more importantly, what NOT to do. This is now my bible of good coding standards in JavaScript.

Did the books help?

So, did reading these books make me a great JavaScript programmer? Of course not. But I was confident enough to design a fairly complex geospatial data visualisation system in JavaScript to work with OpenLayers.

I then when on to write the first basic module and some constructors to do the initial display of data on a map. They have a nice separation of concerns and when I wanted to refactor something it was nice and easy. It also came together quite easily with few bugs, although I really miss not having Unit Tests (that is another blog post to come!) Its likely I will get a JavaScript Guru to build the proper system, but at least I feel confident that I know what is going on now.

Maybe in a future post I will write why I actually LIKE JavaScript now because there are some bits that are really nice. But more importantly JavaScript is now the primary (only?) way ahead for responsive web design.