This two-part article looks at the software issues raised when presented with a database that was designed and built prior to software development. Part 1 (this article) explores the low-level problems of connecting to that database using Entity Framework . Part 2 looks at the high-level problems of transforming the EF data into a form suitable for display via an ASP.NET MVC web site.
Part 1: Using Entity Framework with an existing database
In many large-scale projects software developers are often have to work with existing SQL Server databases with predefined tables and relationships. The problem can be that some predefined databases can have aspects that are not so easy to deal with from the software side.
This is a two part article following my experiences of putting together a web application using the AdventureWorksLT2012 database, which is a cut-down version of the larger AdventureWorks OLTP database. I am using Microsoft’s ASP.NET MVC5 (MVC) with the propriety Kendo UI (KendoUI) package for the UI/presentation layer. The two parts to the article are:
- PART 1 (this article). The best way to get Microsoft’s Entity Framework version 6 (EF) data access tool to import and access the database.
- PART 2: (this link) This looks at the higher level functions is needed to transform that data into a user-focused display. In this case using the GenericServices open-source library to build a service/application layer to connect to modern web front end.
Let us start on…
Using Entity Framework with an existing database
Entity Framework 6 has a number of features to make working with existing databases fairly straightforward. In this article I detail the steps I needed to take on the EF side to build a fully featured web application to work with the AdventureWorks database. At the end I also mention some other techniques that I didn’t need for AdventureWorks, but I have needed on other databases. The aim is to show how you can use EF with pre-existing databases, including ones that need direct access to T-SQL commands and/or Stored Procedures.
1. Creating the Entity Framework Classes from the existing database
Entity Framework has a well documented approach, called reverse engineering, to create the EF Entity Classes and DbContext from an existing database which you can read here. This produces data classes with various Data Annotations to set some of the properties, such as string length and nullablity (see below), plus a DbContext with an OnModelCreating method to set up the various relationships.
namespace DataLayer.GeneratedEf { [Table("SalesLT.Customer")] public partial class Customer { public Customer() { CustomerAddresses = new HashSet<CustomerAddress>(); SalesOrderHeaders = new HashSet<SalesOrderHeader>(); } public int CustomerID { get; set; } public bool NameStyle { get; set; } [StringLength(8)] public string Title { get; set; } [Required] [StringLength(50)] public string FirstName { get; set; } [StringLength(50)] public string MiddleName { get; set; } //more properties left out to shorten the class... //Now the relationships public virtual ICollection<CustomerAddress> CustomerAddresses { get; set; } public virtual ICollection<SalesOrderHeader> SalesOrderHeaders { get; set; } }
This does a good job of building the classes. Certainly having the Data Annotations is very useful as front-end systems like MVC use these for data validation during input. However I did have a couple of problems:
- The default code generation template includes the `virtual` keyword on all of the relationships. This enabled lazy loading, which I do not want. (see section 1.1 below)
- The table SalesOrderDetail has two keys: one is the SalesOrderHeaderID and one is an identity, SalesOrderDetailID. EF failed on a create and I needed to fix this. (See section 1.2 below)
I will now describe how I fixed these issues.
1.1. Removing lazy loading by altering the scaffolding of the EF classes/DbContext
As I said earlier the standard templates enable ‘lazy loading’. I have been corrected in my understanding of lazy loading by some readers. The documentation states that ‘Lazy loading is the process whereby an entity or collection of entities is automatically loaded from the database the first time that a property referring to the entity/entities is accessed’. The problem with this is it does not make for efficient SQL commands, as individual SQL SELECT commands are raised for each access to virtual relationships, which is not such as good idea for performance.
As you will see in part 2 of this article I use a technique that ‘shapes’ the sql read to only loads the individual properties or relationships I need. I therefore do not need, or want, lazy loading of relationships.
Now you could hand edit each generated class to remove the ‘virtual’, but what happens if (when!) the database changes? You would then reimport the database and lose all your edits, which you our your colleague might have forgotten about by then and suddenly your whole web application slows down. No, the common rule with generated code is not to edit it. In this case the answer is to change the code generated during the creating of the classes and DbContext. This leads me on to…
Note: You can turn off lazy loading via the EF Configuration class too, but I prefer to remove the virtual keyword as it ensures that lazy loading is definitely off.
The generation of the EF classes and DbContext is done using some t4 templates, referred to as scaffolding. Be default it uses some internal scaffolding, but you can import the scaffolding and change it. There is a very clear explanation of how to import the scaffolding using NuGet here, so I’m not going to repeat it.
Once you have installed the EntityFramework.CodeTemplates you will find two files called Content.cs.t4 and EntityType.cs.t4, which control how the DbContext and each entity class are build respectively. Even if you aren’t familiar with t4 (a great tool) then you can understand what it does – its a code generator and anything not surround by <# #> is standard text. I found the word virtual in the EntityType.cs.t4 and deleted it. I also removed virtual from the Content.cs.t4 file on the declaration of the DbSet<>.
You may want to alter the scaffolding more extensively, say to add a [Key] attribute on primary keys for some reason. All is possible, but you must dig into the .t4 code in more depth.
One warning about using importing scaffolding – Visual Studio threw a nasty error message when first tried to import using the EntityFramework.CodeTemplates scaffolding (see stackoverflow entry). It took a bit of finding but it turns out if you have Entity Framework Power Tools Beta 4 installed then they clash. If you have Entity Framework Power Tools installed then you need to disable it and restart Visual Studio before you can import/reverse engineer a database. I hope that gets fixed as Entity Framework Power Tools is very useful.
Note: There are two other methods to reverse engineer an existing database:
- EntityFramework Reverse POCO Code First Generator by Simon Hughes. This is Visual Studio extension recommended by the EF Guru, Julia Lerman, in one of her MSDN magazine articles. I haven’t tried it, but if Julia recommends it then its god.
- Entity Framework Power Tools Beta 4 can also reverse engineer a database. Its quicker, only two clicks, but its less controllable. I don’t suggest you use this.
1.2 Fixing a problem with how the two keys are defined in SalesOrderDetail table
The standard definition for the SalesOrderDetail table key parts are as followes
[Table("SalesLT.SalesOrderDetail")] public partial class SalesOrderDetail { [Key] [Column(Order = 0)] [DatabaseGenerated(DatabaseGeneratedOption.None)] public int SalesOrderID { get; set; } [Key] [Column(Order = 1)] public int SalesOrderDetailID { get; set; } //other properties left out for clarity... }
You can see it marks the first as not database generated, but it does not mark the second as an Identity key. This caused problems when I tried to create a new SalesOrderDetail so that I could add a line item to an order. I got the SQL error:
Cannot insert explicit value for identity column in table ‘SalesOrderDetail’ when IDENTITY_INSERT is set to OFF.
That confused me for a bit, as other two-key items had worked, such as CustomerAddress. I tried a few things but as it looked like an EF error I tried telling EF that the SaledOrderDetailID was a Identity key by using the attribute [DatabaseGenerated(DatabaseGeneratedOption.Identity)]. That fixed it!
The best solution would be to edited the scaffolding again to always add that attribute to identity keys. That needed a bit of work and the demo was two days away so in the meantime I added the needed attribute using the MetadataType attribute and a ‘buddy’ class. This is a generally useful feature so I use this example to show you how to do this in the next section.
3. Adding new DataAnnotations to EF Generated classes
Being able to add attributes to properties in already generated classes is a generally useful thing to do. I needed it to fix the key problem (see 1.2 above), but you might want to add some DataAnnotations to help the UI/presentation layer such as marking properties with their datatype, e.g. [DataType(DataType.Date)]. The process for doing this is given in the Example section of this link to the MetadataType attribute. I will show you my example of adding the missing Identity attribute.
The process requires me to add a partial class in another file (see section 3 coming later for more on this) and then add the [MetadataType(typeof(SalesOrderDetailMetaData))] attribute to the property SaledOrderDetailID in a new class, sometimes called a ‘buddy’ class . See below:
[MetadataType(typeof(SalesOrderDetailMetaData))] public partial class SalesOrderDetail : IModifiedEntity { } public class SalesOrderDetailMetaData { [DatabaseGenerated(DatabaseGeneratedOption.Identity)] public int SalesOrderDetailID { get; set; } }
The effect is to apply those attributes to the existing properties. That fixed my problem with EF creating new SalesOrderDetail properly and I was away.
2. What happens when the database changes?
Having sorted the scaffolding as discussed above then just repeat step 1, ‘Creating the Entity Framework Classes from the existing database’. There are a few things you need to do before, during and after the reimport.
- You should remember/copy the name of the DbContext so you use the same name when you reimport. That way it will recompile properly without major name changes.
- Because you are using the same name as the existing DbContext you must delete the previous DbContext otherwise the reimporting process will fails. If its easier you can delete all the generated files as they are replaced anyway. That is why I suggest you put them in a separate directory with no other files added.
- When reimporting by default the process will add the connection string to your App.Config file again. I suggest you un-tick that otherwise you end up with lots of connection strings (minor point, but can be confusing).
- If you use source control (I really recommend you do) then a quick compare of the files to check what has changed is worthwhile.
3. Adding new properties or methods to the Entity classes
In my case I wanted to add some more properties or methods to the class? Clearly I can’t add properties that change the database – I would have to talk to the DBA to change the database definition and import the new database schema again. However in my case I wanted to add properties that accessed existing database properties to produce more useful output, or to have an intention revealing name, like HasSalesOrder.
You can do this because the scaffolding produces ‘partial’ classes, which means I can have another file which adds to that class. To do this it must:
- Have the same namespace as the generated classes
- The class is declared as public partial <same class name>.
I recommend you put them in a different folder to the generated files. That way they will not be overwritten by accident when you recreate the generated files (note: the namespace must be the original namespace, not that of the new folder). Below I give an example where I added to the customer class. Ignore for now the IModifiedEntity interface (dealt with later in this article) and [Computed] attribute (dealt with the the part 2 of the article).
public partial class Customer : IModifiedEntity { [Computed] public string FullName { get { return Title + " " + FirstName + " " + LastName + " " + Suffix; } } /// <summary> /// This is true if any sales orders. We use this to decide if a 'Customer' has actually bought anything /// </summary> [Computed] public bool HasSalesOrders { get { return SalesOrderHeaders.Any(); } } }
Note that you almost certainly will want to add to the DbContext class (I did – see section 4 below). This is also defined as a partial class so you can use the same approach. Which leads me on to…
4. Dealing with properties best dealt with at the Data Layer
In the AdventureWorks database there are two properties called ‘ModifiedDate’ and ‘rowguid’. ModifiedDate needs to be updated on create or update and the rowguid needs to be added on create.
Many databases have properties like this and they are best dealt with at Data/Infrastructure layer. With EF this can be done by providing a partial class and overriding the SaveChanges() method to handle the specific issues your database needs. In the case of AdventureWorks I adding an IModifiedEntity interface to each partial class that has ModifiedDate and rowguid property.
Then I added the code below to the AdventureWorksLt2012 DbContext to provide the functionality required by this database.
public partial class AdventureWorksLt2012 : IGenericServicesDbContext { /// <summary> /// This has been overridden to handle ModifiedDate and rowguid /// </summary> /// <returns></returns> public override int SaveChanges() { HandleChangeTracking(); return base.SaveChanges(); } /// <summary> /// This handles going through all the entities that have /// changed and seeing if we need to do anything. /// </summary> private void HandleChangeTracking() { foreach (var entity in ChangeTracker.Entries() .Where(e => e.State == EntityState.Added || e.State == EntityState.Modified)) { UpdateTrackedEntity(entity); } } /// <summary> /// Looks at everything that has changed and /// applies any further action if required. /// </summary> /// <param name="entityEntry"></param> /// <returns></returns> private static void UpdateTrackedEntity(DbEntityEntry entityEntry) { var trackUpdateClass = entityEntry.Entity as IModifiedEntity; if (trackUpdateClass == null) return; trackUpdateClass.ModifiedDate = DateTime.UtcNow; if (entityEntry.State == EntityState.Added) trackUpdateClass.rowguid = Guid.NewGuid(); } }
The IModifiedEntity interface is really simple:
//This interface is added to all the database entities //that have a modified date and rowGuid. Save Changes uses this // to find entities that need the date updating, or a new rowguid added public interface IModifiedEntity { DateTime ModifiedDate { get; set; } Guid rowguid { get; set; } }
5. Using SQL Store Procedures
Some databases rely on SQL Stored Procedures (SPs) for insert, update and delete of rows in a table. AdventureWorksLT2012 did not, but if you need to that EF 6 has added a neat way of linking to stored procedures. Its not trivial, but you can find good information here on how to do it.
Clearly if the database needs SPs for CUD (Create, Update and Delete) actions then you need to use them. However using EFs CUD actions is easier from the software point of view, and EFs CUD have some nice features. For instance EF has an in-memory copy of the original values and uses this for working out what has changed.
The benefit is that the EF updates are efficient – you update one property and only that cell in a row is updated. The more subtle benefit is tracking changes and handling SQL security, i.e. if you use SQL column level security (Grant/Deny) then if that property is unchanged we do not trigger a security breach. This is a bit of an esoteric feature, but I have used it and it works well.
6. Other things you could do
This is all I had to do to get EF to work with an existing database, but there are other things I have had to use in the past. Here is a quick run through of other items:
6a. Using Direct SQL commands
Sometimes it makes sense to bypass EF and use a SQL command, and EF has all the commands to allow you to do this. The EF documentation has a page on this here which gives a reasonable overview, but I recommend Julia Lerman’s book ‘Programming Entity Framework: DbContext’ which goes into this in more detail (note: this book is very useful but it covers an earlier version of EF so misses some of the latest commands like the use of SPs in Insert, Update and Delete).
For certain types of reads SQL makes a lot of sense. For instance in my GenericSecurity library I need to read the current sql security setup (see below). I think you will agree it makes a lot of sense to do this with a direct sql read rather than defining multiple data classes just to build the command.
var allUsers = db.Database.SqlQuery<SqlUserAndRoleRow>( @"select mp.name as UserName, rp.name as RoleName, mp.type as UserType from sys.database_role_members drm join sys.database_principals rp on (drm.role_principal_id = rp.principal_id) join sys.database_principals mp on (drm.member_principal_id = mp.principal_id) ORDER BY UserName");
For SQL commands such as create, update and delete is is less obvious, but I have used it in some cases. For these you use the SqlCommand method, see example from Microsoft below:
using (var context = new BloggingContext()) { context.Database.SqlCommand( "UPDATE dbo.Blogs SET Name = 'Another Name' WHERE BlogId = 1"); }
Neither of these example had parameters, but if you did need any parameters then SqlQuery and SqlCommand methods can take parameters, which are checked to protect against a SQL injection attack. The Database.SqlQuery Method documentation shows this.
One warning on SqlCommands. Once you have run a SqlCommand then EF’s view of the database, some of which is held in memory, is out of date. If you are going to close/dispose of the DbContext straight away then that isn’t a problem. However if the command is followed by other EF accesses, read or write, then you should use the EF ‘Reload’ command to get EF back in track. See my stackoverflow answer here for more on this.
6b. SQL Transaction control
When using EF to do any database updates using the .SaveChanged() function then all the changes are done in one transaction, i.e. if one fails then none of the updates are committed. However if you are using raw SQL updates, or a combination of EF and SQL updates, you may well need these to be done in one transaction. Thankfully EF version 6 introduced commands to allow you to control transactions.
I used these commands in my EF code to work with SQL security. I wanted to execute a set of SQL commands to set up SQL Security roles and grant/deny access, but if any one failed I wanted to roll back. The code to execute a sequence of sql commands and rollback if any single command fails is given below:
using (var dbContextTransaction = db.Database.BeginTransaction()) { try { foreach (var text in sqlCommands) { db.Database.ExecuteSqlCommand(text); } dbContextTransaction.Commit(); } catch (Exception ex) { dbContextTransaction.Rollback(); //report the error in some way } }
You can also use the same commands in a mixed SQL commands and EF commands. See this EF documentation for an example of that.
Conclusion
There were a few issues to sort out but all of them were fixable. Overall getting EF to work with an existing database was fairly straightforward, once you know how. The problem I had with multiple keys (see section 1.2) was nasty, but now I, and you, know about it we can handle it in the future.
I think the AdventureWorks Lite database is complex enough to be a challenge: with lots of relationships, composite primary keys, computed columns, nullable properties etc. Therefore getting EF to work with AdventureWorks is a good test of EFs capability to work with existing SQL databases. While the AdventureWorks Lite database did not need any raw SQL queries or Stored Procedures other projects of mine have used these, and I have mentioned some of these features at the end of the article to complete the picture.
In fact version 6 of EF added a significance amount of extra features and commands to make mixed EF/SQL access very possible. The more I dig into things the more goodies I find in EF 6. For instance EF 6 brought in Retry logic for Azure, Handling transaction commit failures, SQL transaction control, improved sharing connections between SQL and EF, plus a number of other things. Have a good look around the EF documentation – there is a lot there.
So, no need to hold back on using Entity Framework on your next project that has to work with an existing SQL database. You can use it in a major role as I did, or now you have good connection sharing just use it for the simple CRUD cases that do not need heavy T-SQL methods. Happy coding.
Hello Sir.
1.2 Fixing a problem with how the two keys are defined in SalesOrderDetail
[DatabaseGenerated(DatabaseGeneratedOption.Identity)] solution works only if i insert only single record.How ever, if i addRange a list, then i get IDENTITY_INSERT is set to OFF. error again. How can i solve it. Thank you
Hi Fkbeys,
There was a change (I think in EF Core 5) where there were some changes on how to define a class with multiple primary keys. The correct way to do this is by using the HasKey method – see the Configuring a primary key in the EF Core docs.
PS. I can’t update this old article because the WordPress format has changed.
Excellent!!!
Hi Eduardo. Thanks for the comment an glad you liked it.
Very nice article!
One thing which confuses me though is when you mentioned “You should remember/copy the name of the DbContext so you use the same name when you reimport”? To me it looks like each time I need to add new Entity Data Model with same name using the “Entity Data Model Wizard” with “Code First from database” option.
Thanks for your help!
Sam
Hi Sam,
Glad you found it useful, but sorry my comment didn’t quite make sense. The point I was trying to get across is that to reimport the database you have to delete the existing DbContext otherwise the import says something like ‘a DbContext of the name xxx already exists’. However you want to be sure that when you reimport the changed database you want to get exactly the same DbContext name as before, otherwise the references to the DbContext in your existing code will now show up as compilation errors. Does that make more sense?
Its a small thing but if you are making constant tweaks to the database and reimporting, which I often do at the beginning of a project, you can end up with more edits that you should if you change the DbContext name by accident.