Updating a many to many relationship in entity framework

Last Updated: March 2, 2017 | Created: May 22, 2014
Quick Summary
I found conflicting information on how to update many-to-many relationships in entity framework. I therefore decided to research the problem properly. This post shares with you my research.
I believe I now understand how entity framework works with automatic many-to-many relationships and how to implement my own linking table if I need to.

I had problems working out the best way to update a many to many relationship in Entity Framework (referred to as EF). I ended up doing some research and I thought others might like to see what I found out.

Firstly the environment I am working in:

  • I am using EF 6.1 in Code First mode. I am pretty sure what I have found applies across the board, but I have only tested on EF 6.1
  • I am working in a web application (MVC 5) so I have to handle disconnected classes.

If you just want the quick answer then goto the Part 1 Conclusion. The Conclusions also have a link to a live web site where you can see a many-to-many relationship used for real (with access to the source too). However its worth reading some of the examples so you understand why it works.

UPDATE 2017: I have written a new version of this article for Entity Framework Core (EF Core) to celebrate the release of my book Entity Framework Core in Action.

EF Core does NOT work the same way as EF6.x and the new article provides information on how to handle many-to-many relationships in EF Core.


Part 1: Using EF’s automatic linking of many-to-many relationships

In this example I have simply linked that tag and the post by having ICollection<Tag> Tags in the Post class and ICollection<Post> Posts in the Tag class. See navigation properties in Tag and Post in diagram below:

Tag, Post and Blog entity framework classes (blog dimmed as not important)
Tag, Post and Blog entity framework classes

As explained in the Microsoft tutorial EF will create a linking table called TagPosts or PostTags. This link table, which you never see, provides the many to many linking. This way a post can have none to many tags and tags can be in none to many posts.

So the action I want to perform is to change the tags on a post. Sounds simple, and it is in this example if you watch out for a few things.

First answer: update many-to-many relationship in connected state

Below are two unit Unit Tests which doesn’t need to worry about disconnected classes. These are good starting points as its important to know how EF does the simplest case. The first test adds a tag to a post while the second one replaces the current tags with a new set. Note that the first post starts with two tags; ‘good’ and ‘ugly’.

[Test]
public void Check25UpdatePostToAddTagOk()
{
    using (var db = new SampleWebAppDb())
    {
        //SETUP
        var badTag = db.Tags.Single(x => x.Slug == "bad");
        var firstPost = db.Posts.First();

        //ATTEMPT
        db.Entry(firstPost).Collection( x => x.Tags).Load();
        firstPost.Tags.Add( badTag);
        db.SaveChanges();

        //VERIFY
        firstPost = db.Blogs.Include(x => x.Posts.Select(y => y.Tags)).First()
                            .Posts.First();
        firstPost.Tags.Count.ShouldEqual(3);
    }
}

[Test]
public void Check26ReplaceTagsOk()
{
    using (var db = new SampleWebAppDb())
    {
        //SETUP
        var firstPost = db.Posts.First();
        var tagsNotInFirstPostTracked =
             db.Tags.Where(x => x.Posts.All(y => y.PostId != firstPost.PostId))
                    .ToList();

        //ATTEMPT
        db.Entry(firstPost).Collection(x => x.Tags).Load();
        firstPost.Tags = tagsNotInFirstPostTracked;
        db.SaveChanges();

        //VERIFY
        firstPost = db.Blogs.Include(x => x.Posts.Select(y => y.Tags)).First()
                            .Posts.First();
        firstPost.Tags.Count.ShouldEqual(1);
    }
}

Now, you should see the two important lines (line 11 and 34) which loads the current Tags in the post. This is really important as it loads the post’s current tags so that EF can track when they are changed. There are a number of alternatives to using .Load(), that would work.

  1. Add the appropriate .Include() when loading the posts, e.g.
    var firstPost = db.Posts.Include( post => post.Tags).First()
  2. Make the Tags  property in the Post class virtual, e.g.
    public virtual Collection<Tag> Tags { get; set; }

Having used .Load(), .Include() or a virtual property then EF tracks the data and then does all the work to remove the TagLinks rows. This is very clever and very useful.

I wanted to really prove to myself that my findings were correct out so I wrote another Unit Test to test the failure case. The unit test below shows conclusively that if you don’t load the current Tags it gets the wrong result. As I said earlier the first post started with the ‘good’ and ‘ugly’ tags and should have ended up with ONLY the ‘bad’ tag. However the Unit Test shows it ended up with all three.

[Test]
public void Check05ReplaceAllTagsNoIncludeBad()
{
    using (var db = new MyDbContext())
    {
        //SETUP
        var snap = new DbSnapShot(db);
        var firstPost = db.Posts.First();
        var badTag = db.Tags.SingleOrDefault(x => x.Slug == "bad");

        //ATTEMPT
        firstPost.Tags = new List { badTag };
        db.SaveChanges();

        //VERIFY
        snap.CheckSnapShot(db, 0, 1);
        var readPost = db.Posts.Include(x => x.Tags)
                         .SingleOrDefault(x => x.PostId == firstPost.PostId);
        CollectionAssert.AreEquivalent(new[] { "good", "bad", "ugly" },
                                       readPost.Tags.Select(x => x.Slug));
    }
}

As you can see from the above example it ended up with all three, which is not the right answer.

Second answer: update many-to-many relationship in disconnected state

When working with a web application like MVC an update is done in two stages. Firstly the current data is sent to the user who updates it. Secondly when the user presses submit the new data is sent back, but its now disconnected from the database, i.e. EF is not tracking it. This makes for a slightly more complicated case, but still fairly easy to handle.

Below is my method for updating the posts. In this case I have filled a MultiSelectList with all the tags and it returns the ids of the tags that the user has chosen. I should also point out I use the same method for create and update, hence the test on line 5 to see if I need to load the current tags collection.

private void ChangeTagsBasedOnMultiSelectList(SampleWebAppDb db, Post post)
{
   var requiredTagIds = UserChosenTags.GetFinalSelectionAsInts();

   if (post.PostId != 0)
       //This is an update so we need to load the tags
       db.Entry(post).Collection(p => p.Tags).Load();

   var newTagsForPost = db.Tags
                          .Where(x => requiredTagIds.Contains(x.TagId)).ToList();
   post.Tags = newTagsForPost;
}

The important thing is that I loaded new tags from the database so they are tracked.

Conclusion from part 1

If you want to update an EF provided many-to-many link then:

  1. Pick one end of a many-to-many relationship to update. EF will sort out the other end for you.
  2. Make sure that the collection you will change is loaded, either by putting virtual on the property, using .Include() in the initial load or using .Load() later to get the collection.
    Note: .Include() is the best performing of the three as it means the data is loaded in the initial SQL command. The other two, .Load() and virtual, require a second SQL access.
  3. Make sure the any new items, e.g. Tags in my example, are loaded as a tracked entity.
    This is normal case, but in the disconnected state, i.e. in a web app, that might not be true (see examples above).
  4. Call EF’s .SaveChanges() to save the changes.

Because you have tracked entities then EF’s change tracking will notice it and sort out the adding or removing of rows in the hidden link table for you.

Live web site with many-to-many relationship update

As part of my GenericServices open-source project I have build two example web sites. One of them has a simple list of blog Posts, each of which has one or many Tags. The Posts and Tags are linked by a many-to-many table. This web site is live and you can try it yourself.

  • Edit a post at http://samplemvcwebapp.net/Posts and change the tags – Note: if accessed from desktop then you need to use control-click to select multiple tags.
  • You can also see the code that updates this many-to-many relationship via the open-source project SampleMvcWebApp – see the code right at the end of the DetailPostDto.cs class in the method ChangeTagsBasedOnMultiSelectList.

Part 2: Using your own link table for many-to-many relationships

There are cases when you want to have your own link table, possibly because you want to include some other property in the link table. Below is a case where I have simply created my own link table with no extra properties. As you can see the PostTagLink table has a row for every Tag / Post link just like the hidden table that EF produced in the first instance. However now that we produced this we need to keep it up to date.

Tag, TagPostLink and Post classes
Tag, PostTagLink and Post classes

So let us carry out the same type of Unit Test (connected) code and the MVC (disconnected) code we did in the first Example.

First answer: update many-to-many relationship in connected state

The two unit tests below now show that we need to manipulate our PostTagLink table entries, not the navigation properties in the Post. After we have saved the changes the Post’s AllocatedTags list will reflect the changes through EF’s relational fixup done on any tracked classes.

[Test]
public void Check25UpdatePostToAddTagOk()
{
    using (var db = new SampleWebAppDb())
    {
        //SETUP
        var badTag = db.Tags.Single(x => x.Slug == "bad");
        var firstPost = db.Posts.First();

        //ATTEMPT
        db.PostTagLinks.Add(new PostTagLink { InPost = firstPost, HasTag = badTag });
        db.SaveChanges();

        //VERIFY
        firstPost = db.Posts.Include(x => x.AllocatedTags).First();
        firstPost.AllocatedTags.Count.ShouldEqual(3);
    }
}

[Test]
public void Check26UpdatePostToRemoveTagOk()
{
    using (var db = new SampleWebAppDb())
    {
        //SETUP
        var firstPost = db.Posts.First();
        var postTagLinksToRemove =
             db.PostTagLinks.First(x => x.PostId == firstPost.PostId);

        //ATTEMPT
        db.PostTagLinks.Remove(postTagLinksToRemove);
        db.SaveChanges();

        //VERIFY
        firstPost = db.Posts.Include(x => x.AllocatedTags).First();
        firstPost.AllocatedTags.Count.ShouldEqual(1);
    }
}

I think the code speaks for itself, i.e. you add or remove rows from the PostTagLinks table to change the links.

Second answer: update many-to-many relationship in disconnected state

Just like the first example when using MVC I have filled a MultiSelectList with all the tags and it returns the ids of the tags that the user has chosen. So now I need to add/remove rows from the PostTagLinks table. However I do try not to change links that are still needed, hence I produce tagLinksToDelete and tagLinksToAdd as its more efficient.

private void ChangeTagsBasedOnMultiSelectList(SampleWebAppDb db, Post post)
{
    var requiredTagIds = UserChosenTags.GetFinalSelectionAsInts();

    var tagLinksToDelete =
        db.PostTagLinks.Where(x => !requiredTagIds.Contains(x.TagId) && x.PostId == PostId).ToList();
    var tagLinksToAdd = requiredTagIds
        .Where(x => !db.PostTagLinks.Any(y => y.TagId == x && y.PostId == PostId))
        .Select(z => new PostTagLink {InPost = post, HasTag = db.Tags.Find(z)}).ToList();

    //We get the PostTagLinks entries right, which is what EF needs
    tagLinksToDelete.ForEach(x => db.PostTagLinks.Remove(x));
    tagLinksToAdd.ForEach(x => db.PostTagLinks.Add(x));
    //********************************************************************
    //If using EF 6 you could use the more efficent RemoveRange, e.g.
    //db.PostTagLinks.RemoveRange(tagLinksToDelete);
    //db.PostTagLinks.AddRange(tagLinksToAdd);
    //********************************************************************
}

Conclusion from part 2

If you have your own link table for handling many-to-many relationships you need to

  1. Add or remove entries from you link table, in my case called PostTagLinks.
  2. Make sure the any new items, e.g. Tag, added to your link table is loaded as a tracked entity.
  3. Call EF’s .SaveChanges() to persist the changes.

Well, that took longer than I expect to write the blog, but I hope it helps others in really understanding what is going on underneath EF’s many-to-many relationships. Certainly I now feel much more confident on the topic.

Additional note: You will see I use EF commands directly and do not use a repository or UnitOfWork pattern when accessing the database. You might like to read my post on ‘Is the Repository pattern useful with Entity Framework?‘ as to why I do that.

Is the Repository pattern useful with Entity Framework?

Last Updated: February 21, 2018 | Created: May 10, 2014
Quick Summary
This is, hopefully, a critical review of whether the repository and UnitOfWork pattern is still useful with the modern implementations of Microsoft’s Entity Framework ORM. I look at the reasons why people are suggesting the repository pattern is not useful and compare this with my own experience of using the repository and UnitOfWork over many years.

This is a series:

  1. Part 1: Analysing whether Repository pattern useful with Entity Framework (this article).
  2. Part 2: Four months on – my solution to replacing the Repository pattern.
  3. UPDATE (2018): Big re-write to take into account Entity Framework Core, and further learning.

I have just finished the first release of Spatial Modeller™ , a medium sized ASP.NET MVC web application. I am now undertaking a critical review of its design and implementation to see what I could improve in V2. The design of Spatial Modeller™ is a fairly standard four layer architecture.

Four layer design drawing
Software design of Spatial Modeller Web app

I have use the repository pattern and UnitOfWork pattern over many years, even back in the ADO.NET days. In Spatial Modeller™ I think I have perfected the design and use of these patterns and I am quite pleased with how it helps the overall design.

However we don’t learn unless we hear from others that have a counter view, so let me try and summarise the arguments I have seen against the repository/Unit of Work pattern.

What people are saying against the repository pattern

In researching as part of my review of the current Spatial Modeller™ design I found some blog posts that make a compelling case for ditching the repository. The most cogent and well thought-out post of this kind is ‘Repositories On Top UnitOfWork Are Not a Good Idea’. Rob Conery’s main point is that the repository & UnitOfWork just duplicates what Entity Framework (EF) DbContext give you anyway, so why hide a perfectly good framework behind a façade that adds no value. What Rob calls ‘this over-abstraction silliness’.

Another blog is ‘Why Entity Framework renders the Repository pattern obsolete’. In this Isaac Abraham adds that repository doesn’t make testing any easier, which is one thing it was supposed to do.

I should also say I found another blog, ‘Say No to the Repository Pattern in your DAL’ which, says that using a repository removes access to Linq querying, ability to include/prefetch or aync support (EF 6). However this not true if your repository passes IQueryable items as this allows all these features.

So, are they right?

Reviewing my use of repository and UnitOfWork patterns

I have been using repositories for a while now and my reflection is that when the ORMs weren’t so good then I really needed repositories.

I build a geographic modelling application for a project to improve HIV/AIDS testing in South Africa. I used LINQ SQL, but because it didn’t support spatial parts I had to write a lot of T-SQL stored procedures and use ADO.NET to code access the spatial values. The database code was therefore a pretty complicated mix of technologies and a repository pattern acted as a good Façade to make it look seamless. I definitely think the software design was helped by the repository pattern.

However EF has come a long way since then. Spatial Modeller™ used EF 5, which was the first version to support spatial types (and enums, which is nice). I used the repository pattern and they have worked well, but I don’t think it added as much value as in the South Africa project as the EF code was pretty clean. Therefore I sympathise with the sentiments of Rob etc.

My views on the pros and cons of repository and UnitOfWork patterns

Let me try and review the pros/cons of the repository/UnitOfWork patterns in as even-handed way as I can. Here are my views.

Benefits of repository and UnitOfWork patterns (with best first)

  1. Domain-Driven Design: I find well designed repositories can present a much more domain specific view of the database. Commands like GetAllMembersInLayer make a lot more sense than a complex linq command. I think this is a significant advantage. However Rob Conery’s post suggests another solution using Command/Query Objects and references an excellent post by Jimmy Bogard called ‘Favor query objects over repositories’.
  2. Aggregation: I have some quite complex geographic data consisting of six or seven closely interrelated classes. These are treated as one entity, with one repository accessing them as a group. This hides complexity and I really like this.
  3. Security of data: one problem with EF is if you load data normally and then change it by accident it will update the database on the next SaveChanges. My repositories have are very specific what it returns, with GetTracked and GetUntracked versions of commands. And for things like audit trails I only ever return them as untracked. (see box below right if you not clear on what EF tracking is).
  4. Hiding complex T- SQL commands: As I described before sometimes database accesses need a more sophisticated command that needs T-SQL. My view these should be only in one place to help maintenance. Again I should point out that Rob Conery’s post Command/Query Objects (see item 1) could also handle this.
Not sure what EF Tracked is?
A normal EF database command using DbContext returns ‘attached’ classes. When SaveChanges() is called it checks these attached classes and writes any changed data back to the database. You need to append AsNoTracking() to the linq command if you want a read-only copy. Untracked classes are also slightly faster to load.

Non-benefits of repository and UnitOfWork patterns are:

  1. More code: Repositories and UnitOfWork is more code that needs developing, maintaining and testing.
  2. Testing: I totally agree with Isaac Abraham. Repositories are no easier to mock than IDbSet. I have tried on many occasions to mock the database, but with complex, interrelated classes it is really hard to get right. I now use EF and a test database in my unit tests for most of the complex models. It’s a bit slower but all my mocks could not handle the relational fixup that EF does.

Conclusion

I have to say I think Command/Query Objects mentioned by Rob Conery and described in detail in Jimmy Bogard’s post called ‘Favor query objects over repositories’ have a lot going for them. While the repository/UnitOfWork pattern has served me well maybe EF has progressed enough to use it directly, with the help of Command/Query Objects to make access more Domain-Driven in nature.

What I do in a case like this is build a small app to try out a new approach. I ensure that the app has a few complex things that I know have proved a problem in the past, but small enough to not take too long. This fits in well with me reviewing the design of Spatial Modeller™ before starting on version 2. I already have a small app looking at using t4 code generation to produce some of the boiler plate code and I will extend the app to try a non-repository approach. Should be interesting.

UPDATE – 4 months & 8 months later
Read the second part of this series after four months of development and my conclusions on how to replace the Repository/Unit of Work pattern with new ways of working.  Quite a journey, but I think a useful development. Have a read and see what you think. PS. There is a live example web site showing some of the approaches which you can play with and access the code.
Happy coding!

Why I’m not scared of JavaScript any more

Last Updated: May 2, 2014 | Created: April 10, 2013

My recent programming experience has been in C# and .NET, mainly on windows apps to support my business. However I needed to move to a web-based application, especially for our work in Africa.

The problem was the only bits of JavaScript I had seen up to this point was a few lines to say handle a button click. It looked very unstructured and, for a person used to OOP and C#,  rather nasty. So, what do I do in these cases… I bought a book, or in this case I bought two books. The effect was magic.

javascript design patterns bookBook 1: Learning JavaScript Design Patterns

The first book I read was Learning JavaScript Design Patterns by Addy Osmani. I bought this because a) it was on design patterns and b) it was very recent. The last is important because JavaScript is developing at such a fast pace.

This book was just what I needed. I opened my eyes to how to Construct JavaScript cleanly and provide good structure to my programming. It made me much more confident of how to write well structured JavaScript.

I really recommend this book to anyone who is a programmer but doesn’t really know about the new ways of  building JavaScript programs.

effective javascript bookBook2: Effective JavaScript

However I knew that a book on design patterns wouldn’t tell me the nitty gritty of the language. I already had the book JavaScript: The Good Parts by Douglas Crockford, but that hadn’t helped me to get inside the thinking of JavaScript.

I found a recent book called Effective JavaScript by David Herman. I have two Effective C# books and they really get into the depths of C#, so I got the JavaScript version. I was not disappointed.

Where the Design Pattern book gave me the overview the Effective JavaScript book took me deep into the inner workings of JavaScript. Also, from skim reading it to get a good idea of what to do and, more importantly, what NOT to do. This is now my bible of good coding standards in JavaScript.

Did the books help?

So, did reading these books make me a great JavaScript programmer? Of course not. But I was confident enough to design a fairly complex geospatial data visualisation system in JavaScript to work with OpenLayers.

I then when on to write the first basic module and some constructors to do the initial display of data on a map. They have a nice separation of concerns and when I wanted to refactor something it was nice and easy. It also came together quite easily with few bugs, although I really miss not having Unit Tests (that is another blog post to come!) Its likely I will get a JavaScript Guru to build the proper system, but at least I feel confident that I know what is going on now.

Maybe in a future post I will write why I actually LIKE JavaScript now because there are some bits that are really nice. But more importantly JavaScript is now the primary (only?) way ahead for responsive web design.