EF Core: taking full control of the database schema

Entity Framework Core (EF Core) is the new(ish) version of Microsoft’s widely-used database access library. EF Core allows you to map .NET classes and properties to database tables and columns respectively, and makes writing database access code much simpler than other alternatives, like ADO.NET.

EF Core maps classes to the database by building a model of the database structure, based on the software classes and configuration you set. This means that when you change the software structure or the database structure, known as the schema, then something must happen to make them match. EF Core provides various commands to do this – you can either change the database schema based on the software structure, or change the software structure based on the database schema.

In this article, I describe an alternative way to handle a change to a database’s schema from EF Core. My approach provides a tool that will check that EF Core’s model of the database matches the actual database schema. This allows a developer to take full control of the software structure and the database structure with the confidence that the tool will catch any mistakes they make in matching the two. I will now:

  • List the two ways EF Core can update the database, and add my alternate method.
  • Describe how my alternate method works
  • Compare each approach and look at their advantages and disadvantages
  • Detail my experience of using my alternative approach in real projects

Who controls the database schema?

In EF Core terms, there are two ways in which the database scheme can be changed, or in EF Core terms ‘migrated’.

  1. Code-first: Here the classes that are mapped to the database (known as entity classes), and the EF Core’s configuration commands define what the database schema looks like. You can then run a command, such as Add-Migration, that creates code can update the database’s schema.
  2. Database-first: For the situation where you have an existing database EF Core has a command called Scaffold-DbContext that will read the schema of a given database and then produce the entity classes and the EF Core configuration to match the database.

I call my alternate approach SQL-first, for reasons that will become obvious.

  1. SQL-first. In this approach SQL change scripts controls the database structure changes. Then I use a tool I have developed to compare EF Core’s entity classes and configuration to highlight any places where they differ from the database (or the database differs from the code).

The use of SQL change scripts to change a database’s schema is well known (and liked) by many DevOps people. In fact, Microsoft recommend using SQL change scripts, created via EF Core’s migration feature, to update a production database.

The problem with writing your own SQL change scripts is making sure your changes to the database match what EF Core thinks the database looks like. This is where my new tool comes in, and in the next section I describe how I make the SQL-first approach work.

How does the SQL-first approach work?

In this part I am going to describe the steps in using the SQL-first approach. Let’s assume I want to make a change to my application to support a new feature. I work out how that affects the database and then make the following changes:

  • I change my EF Core entity classes and configuration to the form I want.
  • I create a new SQL change script that will update an existing database schema with the same change.

It doesn’t matter if I change the classes first, or the database first. What does matter is: does the database’s schema and EF Core’s internal model of the database schema match? That is where my EF Core tool, which I refer to as the EfSchemaCompare feature, comes into play.

The following figure shows the steps to check a SQL change script matches what the EF Core’s view of the database. If you follow the 1 to 5 items on the diagram it will step you through the process of ensuring the database and the software match.

As you can see, my EfSchemaCompare tool checks that my changes to the EF Core code and the update that my SQL script applied to the database match. If they don’t, then I get a list of error messages that point out what is wrong. I fix them, either in the database or my software, and run the process again.

I have not defined how you apply the SQL change scripts, but there are plenty of packages that can do that (and more). I personally have used the open-source library DbUp, which I found very useful. Whatever script application package you use needs to know what scripts has been applied, and only apply scripts that aren’t already applied to the database.

How to run the EfSchemeCompare feature

I have added the EfSchemeCompare feature to my NuGet/GitHub project EfCore.TestSupport. This library contains a whole range of methods and classes to help you unit test an application that used EF Core for database accesses. As I want to check that the EF Core’s view of the database matchs the test database in my unit tests, then putting the feature inside the EfCore.TestSupport library makes sense to me.

The code below shows an xUnit unit test that checks new database against EF Core’s view of the database. In this case there should be no errors, because the database was created using EF Core.

[Fact]
public void CompareViaContext()
{
    //SETUP
    using (var context = new BookContext(_options))
    {
        //These two commands ensure I have a new database that matches EF Core
        context.Database.EnsureDeleted();
        context.Database.EnsureCreated();

        var comparer = new CompareEfSql();

        //ATTEMPT
        //This will compare EF Core model of the database 
        //with the database that the context's connection points to
        var hasErrors = comparer.CompareEfWithDb(context);

        //VERIFY
        //The CompareEfWithDb method returns true if there were errors.
        //The comparer.GetAllErrors property returns a string
        //where each error is on a separate line
        hasErrors.ShouldBeFalse(comparer.GetAllErrors);
    }
}

In this case the hasErrors variable will be false, because the database was created by EF Core and therefore matches the EF Core Model. But in cases where the test fails I have arranged, via the comparer.GetAllErrors property, that the xUnit runner will list all the errors in the unit test output window. (The ShouldBeFalse extension method is part of the xUnit Fluent asserts. It takes a second parameter that is a string, which is output if the test fails).

Here is an alternative method where I supply a connection string to a database that I want to check. The connection string could point to any database you like, such as your production database. That gives you early warning that this version of your application cannot be deployed without first running a migration on the database.

[Fact]
public void CompareViaConnection()
{
    //SETUP
    using (var context = new BookContext(_options))
    {
        var comparer = new CompareEfSql();

        //ATTEMPT
        //The connection string points to a database that you want to check
        var hasErrors = comparer.CompareEfWithDb(connectionString, context);

        //VERIFY
        hasErrors.ShouldBeFalse(comparer.GetAllErrors);
    }
}

If this process finds an error(s) you get a series of error messages. Here are some examples

  1. DIFFERENT: MyEntity->Property ‘MyString’, column type. Expected = varchar(max), found = nvarchar(max)
    This says it found a difference in the column type of the property MyString in my entity class, MyEntity. EF Core thinks the SQL type should be varchar(max), that is a 8-bit ASCII storage type, while the actual database is nvarchar(max), that is a 16-bit Unicode storage type.
  2. NOT IN DATABASE: Entity ‘LineItem’, table name. Expected = LineItems
    This says the table LineItems that entity class called LineItem maps to wasn’t found in the database.
  3. EXTRA IN DATABASE: MyEntity->PrimaryKey ‘PK_MyEntites’, column name. Found = MyEntityId
    This tells you that a column called MyEntityId, which is (part of) a primary key was found in the database, but wasn’t in EF Core’s list of primary key properties in the entity class, MyEntity.

Version 1.1.2 of the EfSchemaCompare feature uses about 20 different types of tests, of which seven are applied to each column. This isn’t the place to describe how the EfSchemaCompare feature works (see here for a detailed description) but it is quite versatile, with lots of options. For instance, you can suppress any errors that you don’t care about – say you didn’t care about any EXTRA IN DATABASE errors, then you can add that type of error to the LogsToIgnore.

The Pros and Cons of the three approaches

I like to critique any approach I create, because there is always so sort of trade-offs when there are multiple ways to do something. Here is a table that summarises the advantages and disadvantages of each approach, with a reference in brackets, e.g. (see 1), to sections with more information.

Approach Advantages / Disadvantages
1. Code-first Advantages:
·         Easy for a software developer
·         Can use all EF Core featuresDisadvantages:
·         EF Migrations can be difficult (see 1)
·         Can’t access all SQL features (see 2)
2. Database-first Advantages:
·         Easy for a software developerDisadvantages:
·         Creates “poor” entity classes (see 3)
·         Can’t use some EF Core features (see 4)
3. SQL-first Advantages:
·         Can use all SQL features
·         Can use all EF Core features
·         Easy to use in automated updateDisadvantages:
·         Developer needs to know SQL (see 5)

Note: Currently the database-first approach in EF Core 2.0.0 requires you to delete all the code that EF Core generated when you used the Scaffold-DbContext command. That makes importing updates to the database rather tedious and time consuming. But a command that will “tweak” the generated code is on the EF Core roadmap and should be out sometime in 2018.

1. EF Migrations can be difficult

EF migrations work well for a simple system, but they start to get complex when used on a live site. There are two issues:

  1. If you have changes to the database that will lose data or change data, then EF Core can’t handle that. You will need to write extra code to the migration to deal with that.
  2. The recommendation of the EF Core team is that migrations be applied via scripts, or a command line tool, in production. This means you are into handling SQL scripts anyway.

2. Can’t access all SQL features

EF Core uses the main SQL commands, like ALTER, to change a database’s schema. EF Core is richer in what SQL parts it can do than EF6.x, with commands to set column defaults or computed columns, but there are still things it won’t do, like place a CONSTRAINT on a value column.

In addition, if a database change requires data must be moved or reformatted, then EF Core can’t help – you have to write SQL code to do that.

3. Creates “poor” entity classes

EF Core’s Scaffold-DbContext command produces entity classes which has a navigational property for all the links between tables. Personally, I only define a navigational link from one entity class to another if there is a good business/development reason for it to be there. Having every navigational link possible in an entity class obscures the correct, business-based connections that I should be using.

4. Can’t use some EF Core features

EF Core’s Scaffold-DbContext command produces entity classes that are generic in nature, with every column mapped to a property which has public read/write. This means you can’t use some of EF Core’s features that allow you to ‘hide’ data and/or relationships. These can be very useful in building robust systems by limiting how certain data is accessed.

In addition, EF Core provides a number of table mapping features, table-per-hierarchy, table splitting and owned types, which the scaffolder can’t take advantage of.

5. Developer needs to know SQL

Yes, the SQL-first approach does need you as developer to understand SQL. For some that is a down side – it’s another language to learn. For those who don’t want to do that the EF Core methods are best.

But if you do want to use SQL and don’t know it that well (I’m not an expert), then you can get some help from EF Core. In EF Core, there are two commands meant for unit testing called Database.EnsureDeleted and Database.EnsureCreated. By running those two, one after the other, you can get EF Core to output SQL to create all the tables, constraints, indexes etc. that are needed to produce a database that exactly matches what EF Core need. You can capture this SQL via EF Core’s logging (my library has an extension method to help with that), which will give you a set of SQL commands to act as template for your SQL change script.

For small changes, like adding a column to an existing table, you need to snip out the relevant part from the SQL that EF Core produced and re-craft it into a SQL ALTER TABLE command. But at least you have the correct names, types, constraints that EF Core expect to see.

My experience of using the SQL-first approach

This isn’t the first time I have developed or used a EfSchemaCompare feature – I produced such a tool for EF6.x back early 2016, and I have used it in anger on two projects. It worked well, even on a multi-person project. The benefits were:

  • Complete control over the database schema
  • Complete control over the EF entity classes and configuration
  • Ability to add/alter data via SQL scripts
  • Can produce a new database much quicker than EF can (useful for testing)
  • SQL Scripts are great for production
  • I can use the EfSchemaCompare tool to check my production database matches before deploying new code
  • If done carefully then migrations can be run out-or-order – useful when different developers are working on different parts of the database.

The new EF Core version of the EfSchemaCompare feature is far superior to my EF6.x version because of EF Core. Firstly, EF6.x’s access to the database model is poor, while EF Core’s is brilliant – it allows access to every aspect of the database mapping. This mean I can handle column SQL defaults, computed columns, and the table mapping features, table-per-hierarchy, table splitting and owned types. My EF6.x couldn’t handle any of those.

Secondly, I could access EF Core’s scaffold service that is used by the Scaffold-DbContext command. This means my compare feature will work with any relational database that EF Core supports (my old EF6.x version only worked with SQL Server). That is a great feature to have.

Also, my second implementation is better because I learnt what I needed from using the EF6.x version. My EF Core version now provides more comprehensive coverage, optional suppression of specific errors, handles multiple DbContext’s mapped to the same database (known as bounded contexts). It has a small number of limitations – see here for the latest list.

Conclusion

In this article, I have described the alternative SQL-first approach to changing a database’s schema when working with EF Core as your database access library. The SQL-first approach has some pros and cons, but I like it because it allows me to use all the great features of EF Core, and exactly define my SQL database the way I want to.

My experience of using the SQL-first approach with EF6.x was very positive, which made it a priority to create a EF Core version, which I have now done. EF Core’s features, such as the Model property and access to EF Core internal services, made it easier to create a EfSchemaCompare tool and that tool provides a better checks than the EF6.x version did.

The EfSchemaCompare tool is available via the open-source (MIT licence) EfCore.TestSupport package on NuGet, and the code and documentation can be found on GitHub.

Happy coding.

  • Pingback: Handling Entity Framework database migrations in production – part 2, Keeping EF and SQL scheme in step – The Reformed Programmer()

  • Pingback: Handling Entity Framework database migrations in production – part 4, release of EfSchemaCompare – The Reformed Programmer()

  • Shane

    “3. Creates “poor” entity classes………. Having every navigational link possible in an entity class obscures the correct, business-based connections that I should be using.”
    I actually found this useful, as business logic would certainly need to know what is changing for business operations, but all the not-so-obvious cascading dependencies that also need to be updated/deleted are unclear in code – having the navigation properties helps me to see what I need to double-check deeper down in the database.
    I’m basing this on a reasonably (sometimes very) complex production database schema.
    Out of interest, I would value your input on the simple process of double & triple checking you haven’t missed anything in these scenarios?
    Thanks for the awesome blog, your approaches / opinions are very congruent with mine. I’m really enjoying the read, and gaining a lot from it.
    -Shane

    • Hi Shane,

      I can see that having all the relational links visible help you to see the dependencies, especially when dealing with a new database. If I was working with an existing database I would use scaffolding to expose everything too. But going forward I want to control the the relationships for the following reasons:
      – I want my code to reflect the business rules I have. So, if entity A is connected to entity B,but I don’t expect entity B to access entity A I leave out the back link. That keeps reminding me that I am not expected to use that link.
      – I want to keep my code as simple and focused on the problem, so I leave unnecessary relationships out.

      Does that make sense?

      PS. Glad you like my blog. I haven’t been writing much while I was writing the book (such hard work!), but I hope to write some more articles in the coming months.