Handling Entity Framework database migrations in production – part 2, Keeping EF and SQL scheme in step

Last Updated: November 23, 2017 | Created: November 4, 2015

This is the second article in a series about using Microsoft’s data access technology, Entity Framework (EF) in an application that to be ‘always on’, in my case an ASP.NET MVC e-commerce web site, yet is still being developed so there is a need to update the database schema, i.e. the tables, views, constraints etc. that define how the data is held in the database.

In this article I look at the problem of keeping EF’s view of the database schema, known as the EF database model, in step with the actual SQL database schema. The list of articles will be:

UPDATE: I have released a version for Entity Framework Core – see this article.

I will start by talking about how EF works, especially around how EF builds its in-memory view of the database, known as the database model, and what happens when it applies database migrations.

Note: Please see the first article for my reasons why I decided EF’s standard migration feature was not sufficient.

Matching Entity Framework changes to Database changes

EF is great to develop with as it does a number of clever things in the backgrounds to help developers. One of the main useful things EF does is make a relational database look like a set of linked classes. This is very clever, but to do this EF’s view of the database relationships must match the reality of what is in the d. To understand why this is problem then I need to describe how EF knows about the database schema.

How EF finds out about the database schema

EF holds a view of what it thinks the database schema is in what it calls the ‘database model’. This is a set of metadata that it builds at application start-up from the EF DbContext class and the associated classes referred to by your DbContext. There are therefore two main ways of telling EF what the database schema looks like. They are:

  1. EF uses your code to define the database schema.
    In this mode the developer normally defines/designs the classes and configures a special class called DbContext, which EF uses to calculate the EF metadata model, i.e. its view of what the database schema should look like. Then used what EF calls a ‘database initializer’ to create the database. There are then ways to update the database if you change the EF database model, see Code First Migrations.
  2. EF imports the schema of and existing database to create the data classes.
    The other way is to allow EF to build the classes and DbContext based on the schema of an existing database. It does this by a one-time scan of the database schema and using templates to build the classes and DbContext (see another of my articles which describes this in detail). If you change the database then you can either re-import database schema again, or there is also a way of using Code First Migrations with an existing database to handle update.

The important thing to see is that, other than the one-time scan in point 2, EF never looks at the schema of the database, even when it is doing migrations. It just assumes the database in the state that EF’s metadata model says it is. (For more on how this works under the hood then see Max Vasilyev’s interesting article about how EF migrations are calculated and stored).

What happens if EF’s database model is not correct?

EF will only find out there is a mismatch between its database model and the actual database schema when it accesses the part of the database that does not match. I have not tried all possible combinations, but the error I got when I added a properly called ‘EfOnlyProperty’ to an EF class without a corresponding column in the table was:

System.Data.Entity.Infrastructure.DbUpdateException : An error occurred while updating the entries. See the inner exception for details. —->
System.Data.Entity.Core.UpdateException : An error occurred while updating the entries. See the inner exception for details. —->
System.Data.SqlClient.SqlException : Invalid column name ‘EfOnlyProperty’.

Another example is when EF creates a database then it can add some hidden tables to facilitate many-to-many relationships (see my article on how EF handles many-to-many relationships). This means that any changes to the database must include these extra tables for EF to work properly. If a class called ‘Parent’ has a many-to-many relationship with a class called ‘Child’ and the linking table is missing then you get this error.

System.Data.Entity.Infrastructure.DbUpdateException : An error occurred while saving entities that do not expose foreign key properties for their relationships. The EntityEntries property will return null because a single entity cannot be identified as the source of the exception. Handling of exceptions while saving can be made easier by exposing foreign key properties in your entity types. See the InnerException for details. —->
System.Data.Entity.Core.UpdateException : An error occurred while updating the entries. See the inner exception for details. —->
System.Data.SqlClient.SqlException : Invalid object name ‘dbo.ParentChild’.

I needed an EF/SQL sanity checker

The idea of changing my database and having EF out of step was not appealing at all. That way you just don’t know what will happen, which is deadly.

I could build the database using SQL and use EF’s import existing database method described earlier. However I know from experience that it can be a pain when database updates come along and you need to re-import again. Also EF’s database import uses set property names and exposes every relationship, which isn’t ideal for good programming practice.

So I wanted to use EF Code First approach yet define the database through T-SQL. That meant I needed something to check that my EF/software view of the database schema and the actual SQL database schema were in step. Ideally this would be something I could run as part on my Unit Tests so that any mismatch shows up immediately.

My solution to comparing EF/SQL databases

I therefore created some software that compared the EF database model metadata against the actual SQL schema as read from a database. This method runs as a Unit Test and reads the EF database model metadata and reads the actual SQL database schema. It then compares the following:

  • Each table: does a table exist for each EF class?
  • Each column: does a column match EF class properties in name, type/nullable, size, primary key(s) and key order?
  • Each relationship:
    • Do SQL foreign key constraints exist for each EF relationship?
    • Have the extra many-to-many tables that EF used been configured properly?
    • Are the Cascade Deletes the same between EF and SQL?

If any of the above is out of step then the Unit Test fails and outputs useful error messages. At the same time it also produces warnings for tables and/or columns in the SQL database that EF does not use. These should not cause a problem to EF, but might show something that EF has missed.

Here is an example of a call to CompareEfWithDb to check that the EF’s model matches the schema of the database that the ‘YourDbContext’ is connection string points to:

using (var db = new YourDbContext())
{
    var comparer = new CompareEfSql();

    var status = comparer.CompareEfWithDb(db); 
    //status.IsValid is true if no errors. 
    //status.Errors contains any errors.  
    //status.Warnings contains any warnings
}

In my Unit Test I fail the test if status returned is not valid and I print out any error messages. I also tend to fail the test on warnings too, as it often points to something I have missed.

Another slightly different method has an extra parameter with a connection string to a separate database. This allows me to check a different database schema, e.g. my production database, against the current EF model. By running this as an extended Unit Test before deployment I make sure that I know if the production database needs to be updates before the new software is deployed. This gives me great peace of mind that my SQL databases are in line with the current EF data classes.

Here are two examples of the type of error output I get when running the CompareEfWithDb method. The first is a simple example is of a missing column in the database. The error message from calling the CompareEfWithDb method is:

Missing Column: the SQL table [dbo].[Child] does not contain a column called EfOnlyProperty. Needed by EF class Child.

The second example is a class called ‘Parent’ which has many-to-many relationship with a class called ‘Child’ and the linking table is missing. The error messages from calling the CompareEfWithDb method are:

  1. Missing Link Table: EF has a Many-to-Many relationship between Parent.ManyChildren and Child but we could not find a linking table with the right foreign keys.
  2. Missing Link Table: EF has a Many-to-Many relationship between Child.ManyParents and Parent but we could not find a linking table with the right foreign keys.

Add CompareSqlToSql for a fuller picture

I found the CompareEfWithDb method very helpful, but it doesn’t quite cover everything. At time goes on I am planning to move some of the more complex access to SQL Stored Procedures (SPs) to both gain better performance and facilitate the decoupling of the database from the software. EF can use SPs but they don’t appear in EF database model, so CompareEfWithDb couldn’t help.

Having created all the code for CompareEfWithDb it was pretty simple to create a SQL to SQL compare, called CompareSqlToSql. This could check one database against another, and can include a comparison of other SQL features. It does a slightly fuller comparison that the EF/SQL does as we can compare everything rather than surmising some things, like many-to-many table settings, from EF. It can also check non-EF parts like SPs and their parameters.

Note: You still need good integration and system level testing to catch anything that these tools miss.

Using SQL compare to understand what EF is doing

I should say that CompareSqlToSql has proved to be much more useful than just checking SPs. It turns out that EF is somewhat forgiving if you make mistakes when you use EF’s Data Annotations or EF’s Fluent API to alter the way the database works. For instance if you configure a required to optional relationship but give it a non-nullable foreign key it will add its own hidden nullable foreign key. In fact I found about three errors like this in my 16-table application, all around relationships.

If you want to create a database that the EF database can use you need to fix these errors in your code, or at least replicate what EF has done for the database to work. CompareEfWithDb won’t spot them for you, but comparing an EF-generated database with your script-generated database will find them. To do this you need to:

1. Create a new database using the DbContext method .Database.Create() then you have a SQL database based on EF’s database model. We will call it MyEfDb and I give the code below:

using (var db = new YourDbContext(MyEfDbConnectionString))
{
    if (db.Database.Exists())
    {
        db.Database.Delete();
    }
    db.Database.Create();
}

2. Then create a database using your scripts and DbUp (see first article). We will call it MySqlDb.

3. Now run CompareSqlToSql (or another compare scheme tool – see below) with MyEfDb as the reference database and MySqlDb as the ‘to be checked’ database.

You can spot EF corrections to your minor configuration mistakes mainly by the column names, which normally contain the name with a _ in the middle. I have also found places where the Cascade delete option was different, again through me misconfiguring a relationship.

Using a SQL compare to build your scripts

The other reason for doing a SQL compare of an EF-generated database with your current script-generated database is to build the scripts you need to update your database after you have made changes to the EF classes. There are various tools that can compare two SQL databases and provide a script to update one to the other – see this useful list on stackoverflow, although it is a bit old. Note: some of these are commercial products that you need to buy.

If you don’t have one of these then the output of CompareSqlToSql will show all the SQL differences, but not in a nice SQL script way. For instance if we repeat the many-to-many example above, with a class called ‘Parent’ which has many-to-many relationship with a class called ‘Child’ and the linking table is missing. The error messages from CompareSqlSql are:

  1. Missing Table: The ‘MyEfDb’ SQL database has a table called [dbo].[ParentChild], which is missing in the MySqlDb’ database.
  2. Missing Foreign key: The ‘MyEfDb’ SQL database has a foreign key Parent: ParentChild.Parent_ParentId, Referenced: Parent.ParentId, which is missing in the ‘MySqlDb’ database.
  3. Missing Foreign key: The ‘MyEfDb’ SQL database has a foreign key Parent: ParentChild.Child_ChildId, Referenced: Child.ChildId, which is missing in the ‘MySqlDb’ database.

Note: If anyone is interested in the CompareEfWithDb and CompareSqlToSql Unit Test methods I could make the package publicly available. However the first version was a massive 2-day hack and does not handle all possible EF combinations, such as TPT and TPH inheritance, complex types etc. It therefore needs a lot more work before it can be released for general use.

Telling Entity Framework that you will handle migrations

The last thing we need to do is stop EF handling database changes when you change your EF code. If you don’t turn this off then when you change EF database classes EF will block you from running the application until it has updated the database, using whatever EF ‘Database initializers’ is set up (EF’s default initializer is CreateDatabaseIfNotExists) .

To stop EF trying to handle migrations we have to do is provide a null database Initializer. There are two ways of doing this (you only need one):

  1. Call ‘SetInitializer<YourDbContext>(null)’ at startup
  2. Add the following to the <appSettings> part of your web/application config file, e.g.
<appSettings>
    <add key="DatabaseInitializerForType YourApp.YourDbContext, YourApp" value="Disabled" />
</appSettings>

I personally add the appSetting to my config file as that way I know it is done.

Note: See this useful documentation for more on null database Initializers.

Conclusion

Well done for reading this quite long article. In this post I have described how EF keeps its model view in line with the actual database schema. From this we see it does not read the schema other than in one very limited situation of importing an existing database. I also describe what happens if EF’s database model and the actual database are out of step.

I then go on to describe a Unit Test tool that I have developed to check if EF’s database model is the same as the actual SQL model. However you can use the technique in the section ‘Using SQL compare to build the right update scripts’ to create two database and then use some sort of SQL compare tool to get the difference script.

Note: If you are interested in using the CompareEfWithDb and CompareSqlToSql Unit Test methods then let me know. I won’t be able to get a release out of this package for some time as it needs a lot of extra work to support all EF’s features and I am busy working on another project.

Finally I describe how to stop EF from trying to look after changes in the database model, which you need to do if you are taking over the job of handling database migrations.

Happy coding!

Postscript

I am now using this approach on my existing e-commerce project and I needed to add my first change the the database schema. The change was simple, adding two bytes to an existing table, which is non-breaking change to the database. It is worth comparing the new process with the old process.

Previously I was using EF data migrations such a change used to take me 30 minutes+. Having made the change to the data classes I needed to check those changes by updating/recreating the Unit Test database. This I had to do in a different way to the local/azure databases because the DatabaseInitializer was different, which required me to editing out some code, deleting the Unit Test database, and then resorting some code. Also, because the way I updated the Unit Test was different to the local/production databases I needed to check everything again when I undated the local/azure databases .

In comparison using the process laid out in these two articles it took me less than 15 minutes to make the changes on all three databases: Unit Test database, local developer database and Azure. I use the similar method to update all three database, but in the case of the Unit Test I also do a complete delete/setup to ensure there isn’t a problem later. However once I had changed the Unit Test database and ran CompareEfWithDb and CompareSqlToSql I was very sure that the other two updates would work. Much less stressful.

I also found it more natural to define the alter table in T-SQL. Maybe its just my preference, but you do get Intellisence when writing the T-SQL script in Visual Studio.

Nice to see it working for real.