Updating a many to many relationship in entity framework

Quick Summary
I found conflicting information on how to update many-to-many relationships in entity framework. I therefore decided to research the problem properly. This post shares with you my research.
I believe I now understand how entity framework works with automatic many-to-many relationships and how to implement my own linking table if I need to.

I had problems working out the best way to update a many to many relationship in Entity Framework (referred to as EF). I ended up doing some research and I thought others might like to see what I found out.

Firstly the environment I am working in:

  • I am using EF 6.1 in Code First mode. I am pretty sure what I have found applies across the board, but I have only tested on EF 6.1
  • I am working in a web application (MVC 5) so I have to handle disconnected classes.

If you just want the quick answer then goto the Part 1 Conclusion. The Conclusions also have a link to a live web site where you can see a many-to-many relationship used for real (with access to the source too). However its worth reading some of the examples so you understand why it works.

UPDATE 2017: I have written a new version of this article for Entity Framework Core (EF Core) to celebrate the release of my book Entity Framework Core in Action.

EF Core does NOT work the same way as EF6.x and the new article provides information on how to handle many-to-many relationships in EF Core.


Part 1: Using EF’s automatic linking of many-to-many relationships

In this example I have simply linked that tag and the post by having ICollection<Tag> Tags in the Post class and ICollection<Post> Posts in the Tag class. See navigation properties in Tag and Post in diagram below:

Tag, Post and Blog entity framework classes (blog dimmed as not important)
Tag, Post and Blog entity framework classes

As explained in the Microsoft tutorial EF will create a linking table called TagPosts or PostTags. This link table, which you never see, provides the many to many linking. This way a post can have none to many tags and tags can be in none to many posts.

So the action I want to perform is to change the tags on a post. Sounds simple, and it is in this example if you watch out for a few things.

First answer: update many-to-many relationship in connected state

Below are two unit Unit Tests which doesn’t need to worry about disconnected classes. These are good starting points as its important to know how EF does the simplest case. The first test adds a tag to a post while the second one replaces the current tags with a new set. Note that the first post starts with two tags; ‘good’ and ‘ugly’.

[Test]
public void Check25UpdatePostToAddTagOk()
{
    using (var db = new SampleWebAppDb())
    {
        //SETUP
        var badTag = db.Tags.Single(x => x.Slug == "bad");
        var firstPost = db.Posts.First();

        //ATTEMPT
        db.Entry(firstPost).Collection( x => x.Tags).Load();
        firstPost.Tags.Add( badTag);
        db.SaveChanges();

        //VERIFY
        firstPost = db.Blogs.Include(x => x.Posts.Select(y => y.Tags)).First()
                            .Posts.First();
        firstPost.Tags.Count.ShouldEqual(3);
    }
}

[Test]
public void Check26ReplaceTagsOk()
{
    using (var db = new SampleWebAppDb())
    {
        //SETUP
        var firstPost = db.Posts.First();
        var tagsNotInFirstPostTracked =
             db.Tags.Where(x => x.Posts.All(y => y.PostId != firstPost.PostId))
                    .ToList();

        //ATTEMPT
        db.Entry(firstPost).Collection(x => x.Tags).Load();
        firstPost.Tags = tagsNotInFirstPostTracked;
        db.SaveChanges();

        //VERIFY
        firstPost = db.Blogs.Include(x => x.Posts.Select(y => y.Tags)).First()
                            .Posts.First();
        firstPost.Tags.Count.ShouldEqual(1);
    }
}

Now, you should see the two important lines (line 11 and 34) which loads the current Tags in the post. This is really important as it loads the post’s current tags so that EF can track when they are changed. There are a number of alternatives to using .Load(), that would work.

  1. Add the appropriate .Include() when loading the posts, e.g.
    var firstPost = db.Posts.Include( post => post.Tags).First()
  2. Make the Tags  property in the Post class virtual, e.g.
    public virtual Collection<Tag> Tags { get; set; }

Having used .Load(), .Include() or a virtual property then EF tracks the data and then does all the work to remove the TagLinks rows. This is very clever and very useful.

I wanted to really prove to myself that my findings were correct out so I wrote another Unit Test to test the failure case. The unit test below shows conclusively that if you don’t load the current Tags it gets the wrong result. As I said earlier the first post started with the ‘good’ and ‘ugly’ tags and should have ended up with ONLY the ‘bad’ tag. However the Unit Test shows it ended up with all three.

[Test]
public void Check05ReplaceAllTagsNoIncludeBad()
{
    using (var db = new MyDbContext())
    {
        //SETUP
        var snap = new DbSnapShot(db);
        var firstPost = db.Posts.First();
        var badTag = db.Tags.SingleOrDefault(x => x.Slug == "bad");

        //ATTEMPT
        firstPost.Tags = new List { badTag };
        db.SaveChanges();

        //VERIFY
        snap.CheckSnapShot(db, 0, 1);
        var readPost = db.Posts.Include(x => x.Tags)
                         .SingleOrDefault(x => x.PostId == firstPost.PostId);
        CollectionAssert.AreEquivalent(new[] { "good", "bad", "ugly" },
                                       readPost.Tags.Select(x => x.Slug));
    }
}

As you can see from the above example it ended up with all three, which is not the right answer.

Second answer: update many-to-many relationship in disconnected state

When working with a web application like MVC an update is done in two stages. Firstly the current data is sent to the user who updates it. Secondly when the user presses submit the new data is sent back, but its now disconnected from the database, i.e. EF is not tracking it. This makes for a slightly more complicated case, but still fairly easy to handle.

Below is my method for updating the posts. In this case I have filled a MultiSelectList with all the tags and it returns the ids of the tags that the user has chosen. I should also point out I use the same method for create and update, hence the test on line 5 to see if I need to load the current tags collection.

private void ChangeTagsBasedOnMultiSelectList(SampleWebAppDb db, Post post)
{
   var requiredTagIds = UserChosenTags.GetFinalSelectionAsInts();

   if (post.PostId != 0)
       //This is an update so we need to load the tags
       db.Entry(post).Collection(p => p.Tags).Load();

   var newTagsForPost = db.Tags
                          .Where(x => requiredTagIds.Contains(x.TagId)).ToList();
   post.Tags = newTagsForPost;
}

The important thing is that I loaded new tags from the database so they are tracked.

Conclusion from part 1

If you want to update an EF provided many-to-many link then:

  1. Pick one end of a many-to-many relationship to update. EF will sort out the other end for you.
  2. Make sure that the collection you will change is loaded, either by putting virtual on the property, using .Include() in the initial load or using .Load() later to get the collection.
    Note: .Include() is the best performing of the three as it means the data is loaded in the initial SQL command. The other two, .Load() and virtual, require a second SQL access.
  3. Make sure the any new items, e.g. Tags in my example, are loaded as a tracked entity.
    This is normal case, but in the disconnected state, i.e. in a web app, that might not be true (see examples above).
  4. Call EF’s .SaveChanges() to save the changes.

Because you have tracked entities then EF’s change tracking will notice it and sort out the adding or removing of rows in the hidden link table for you.

Live web site with many-to-many relationship update

As part of my GenericServices open-source project I have build two example web sites. One of them has a simple list of blog Posts, each of which has one or many Tags. The Posts and Tags are linked by a many-to-many table. This web site is live and you can try it yourself.

  • Edit a post at http://samplemvcwebapp.net/Posts and change the tags – Note: if accessed from desktop then you need to use control-click to select multiple tags.
  • You can also see the code that updates this many-to-many relationship via the open-source project SampleMvcWebApp – see the code right at the end of the DetailPostDto.cs class in the method ChangeTagsBasedOnMultiSelectList.

Part 2: Using your own link table for many-to-many relationships

There are cases when you want to have your own link table, possibly because you want to include some other property in the link table. Below is a case where I have simply created my own link table with no extra properties. As you can see the PostTagLink table has a row for every Tag / Post link just like the hidden table that EF produced in the first instance. However now that we produced this we need to keep it up to date.

Tag, TagPostLink and Post classes
Tag, PostTagLink and Post classes

So let us carry out the same type of Unit Test (connected) code and the MVC (disconnected) code we did in the first Example.

First answer: update many-to-many relationship in connected state

The two unit tests below now show that we need to manipulate our PostTagLink table entries, not the navigation properties in the Post. After we have saved the changes the Post’s AllocatedTags list will reflect the changes through EF’s relational fixup done on any tracked classes.

[Test]
public void Check25UpdatePostToAddTagOk()
{
    using (var db = new SampleWebAppDb())
    {
        //SETUP
        var badTag = db.Tags.Single(x => x.Slug == "bad");
        var firstPost = db.Posts.First();

        //ATTEMPT
        db.PostTagLinks.Add(new PostTagLink { InPost = firstPost, HasTag = badTag });
        db.SaveChanges();

        //VERIFY
        firstPost = db.Posts.Include(x => x.AllocatedTags).First();
        firstPost.AllocatedTags.Count.ShouldEqual(3);
    }
}

[Test]
public void Check26UpdatePostToRemoveTagOk()
{
    using (var db = new SampleWebAppDb())
    {
        //SETUP
        var firstPost = db.Posts.First();
        var postTagLinksToRemove =
             db.PostTagLinks.First(x => x.PostId == firstPost.PostId);

        //ATTEMPT
        db.PostTagLinks.Remove(postTagLinksToRemove);
        db.SaveChanges();

        //VERIFY
        firstPost = db.Posts.Include(x => x.AllocatedTags).First();
        firstPost.AllocatedTags.Count.ShouldEqual(1);
    }
}

I think the code speaks for itself, i.e. you add or remove rows from the PostTagLinks table to change the links.

Second answer: update many-to-many relationship in disconnected state

Just like the first example when using MVC I have filled a MultiSelectList with all the tags and it returns the ids of the tags that the user has chosen. So now I need to add/remove rows from the PostTagLinks table. However I do try not to change links that are still needed, hence I produce tagLinksToDelete and tagLinksToAdd as its more efficient.

private void ChangeTagsBasedOnMultiSelectList(SampleWebAppDb db, Post post)
{
    var requiredTagIds = UserChosenTags.GetFinalSelectionAsInts();

    var tagLinksToDelete =
        db.PostTagLinks.Where(x => !requiredTagIds.Contains(x.TagId) && x.PostId == PostId).ToList();
    var tagLinksToAdd = requiredTagIds
        .Where(x => !db.PostTagLinks.Any(y => y.TagId == x && y.PostId == PostId))
        .Select(z => new PostTagLink {InPost = post, HasTag = db.Tags.Find(z)}).ToList();

    //We get the PostTagLinks entries right, which is what EF needs
    tagLinksToDelete.ForEach(x => db.PostTagLinks.Remove(x));
    tagLinksToAdd.ForEach(x => db.PostTagLinks.Add(x));
    //********************************************************************
    //If using EF 6 you could use the more efficent RemoveRange, e.g.
    //db.PostTagLinks.RemoveRange(tagLinksToDelete);
    //db.PostTagLinks.AddRange(tagLinksToAdd);
    //********************************************************************
}

Conclusion from part 2

If you have your own link table for handling many-to-many relationships you need to

  1. Add or remove entries from you link table, in my case called PostTagLinks.
  2. Make sure the any new items, e.g. Tag, added to your link table is loaded as a tracked entity.
  3. Call EF’s .SaveChanges() to persist the changes.

Well, that took longer than I expect to write the blog, but I hope it helps others in really understanding what is going on underneath EF’s many-to-many relationships. Certainly I now feel much more confident on the topic.

Additional note: You will see I use EF commands directly and do not use a repository or UnitOfWork pattern when accessing the database. You might like to read my post on ‘Is the Repository pattern useful with Entity Framework?‘ as to why I do that.