Entity Framework: working with an existing database – Part 2, MVC

Last Updated: December 16, 2014 | Created: November 16, 2014
Quick Summary
This two-part article looks at the software issues raised when presented with a database that was designed and built prior to software development. Part 1 explores the low-level problems of connecting to that database using Entity Framework (EF). Part 2 (this article) looks at the high-level problems of transforming the EF data into a form suitable for display via an ASP.NET MVC web site.

Part 2: ASP.NET MVC and Entity Framework with an existing database

  1. PART 1. The best way to get Microsoft’s Entity Framework version 6 (EF) data access tool to import and access the database.
  2. PART 2: (this article) This looks at the higher level functions is needed to transform that data into a user-focused display. In this case using the GenericServices open-source library to build a service/application layer to connect to modern web front end using ASP.NET MVC5.

Introduction

There are many examples of how to build ASP.NET MVC (MVC) web applications on the web. However in order to get the ideas across they often pick rather simplistic database schemes, as I did myself when I build the SampleMvcWebApp. The problem with these examples is that they do not deal with the more complex issues found in real world systems.

I wanted to stress-test some of my own libraries by building a MVC application with a database that has a more realistic level of complexity. For this I chose Microsoft’s sample AdventureWorksLT2012 database, which is a cut-down version of the larger AdventureWorks OLTP database. The AdventureWorks database is for a fictitious multinational manufacturing company producing and selling pedal cycles. (see pdf of database diagram).

In the first part of this article I looked at the problems of getting Entity Framework (EF) to work with existing database like AdventureWorks. In this second article I look at how to build a MVC web application that provided a comprehensive user interface as would be used by an employee of AdventureWorks. In doing this I try to pull out some of the key design and implementation issues that software developers face when working on real-life business systems.

The issues faced in building a web application

One of the fundamental issues that we face in building a complex web application is that what the business data held in the database is often not in the right format to display directly to the user. Let me give you a simple example from AdventureWorks.

I wanted to put up a list of existing companies that had bought from AdventureWorks before. This needed to quickly summarise who they were and give an idea of what level of purchasing they had done in the past. To do that I came up with following list as shown below:

Customer list

Now, to produce this I had to:

  1. Get the company name from the Customer table – easy
  2. Get the contact name. That needed me to combine the following properties from the Customer table:
    • Title, which can be null
    • FirstName
    • MiddleName, which can be null
    • LastName
    • Suffix, which could be null
  3. For the Total Sales Value, I needed to sum the TotalDue property in the SalesOrderHeader table.

On top of that I want to:

  1. Have a pre-filter to only show customers who had bought before in this list.
  2. Be able to search by a company or contact name.
  3. Sort on any column.
  4. Be able to page the data, i.e. only show a selection of say 10 or 20 customers at a time and be able to step forwards/backwards through the list. This is because there are hundreds of customers.

This is a very typical requirement and it happens hundreds of times in a reasonably sized application. The architecture of the system need careful thought as to how to provide these capabilities with a) good performance and b) with the minimum of development effort. That is a challenge.

The power of LINQ with Entity Framework

Before I go onto the solution of this challenge I want to delve into the two technologies that will form the cornerstone of my solution, that is LINQ/IQueryable and Microsoft’s data access technology, Entity Framework (EF).

Note: for those of you who are familiar with LINQ and EF you can skip to the next section, ‘Separation of Concerns and Layered Architecture’.

a. LINQ/IQueryable

LINQ (LINQ stands for Language-Integrated Query) is a way of writing queries on data. The important point is that any query written in LINQ is held as a query and only run when certain commands, like .ToList() are applied to that query. This allows software developers to write part of a filter/select command that can be added/extended later anywhere in the application that understands IQueryable.

Let me take a simple example using a subset of Customer List problem posed above. First let us select the customers who have bought from us before:

var realCustomers = Customers
                    .Where (c => c.SalesOrderHeaders.Any());

Then I select the CompanyName and form the ContactName from this filtered list:

var details = realCustomers.Select( x => new
   {
       CompanyName = x.CompanyName,
       ContactName = x.Title + " " + x.FirstName + " " +
                     x.MiddleName + " " + x.LastName +
                     " " + x.Suffix
   });

This returns an IQueryable<> class, but it has not been ‘realised’, i.e. it is just a command and has not accessed the database yet. This means I can add the following filter commands onto the query:

var filteredQuery = details.Where (x => x.CompanyName.Contains("Bike"));

Finally I might add something to add paging on to that query, e.g.

var finalQuery = filteredQuery.OrderBy(x => x.CompanyName)
                 .Skip(pageSize*pageNumber).Take(pageSize);

Hold that finalQuery as we step onto the next part of the solution which is…

b. Entity Framework

ListResultingFromFinalQueryEF is a database access framework that works by converting LINQ queries into SQL Server database commands, i.e. T-SQL code, and it does it very well. So, if we set pageSize to 5 and the pageNumber to 0 and then realise the finalQuery mentioned previously by putting a .ToList() on the end we get the results on the right.

EF has converted the finalQuery into T-SQL and applied it to the database. If we look at the final SQL command then we will see that it has:

  1. Only selected the columns that it needs.
  2. The concatenation of the names has been turned into a T-SQL string addition, including replacing null items with an empty string.
  3. The two .Where()’s, one looking for customers who have bought and the other looking for company names which contain “Bike”, are combined into a single SQL WHERE clause.
  4. The SQL command orders the data by CompanyName and then does a Skip + Take to return only the data that was requested.

For those who are interested here is the SQL command (text file) that EF produced from these concatenated queries.

Separation of Concerns and Layered Architecture

There is a software approach, almost a rule, called Separation of Concerns. In a Microsoft document it defines this term as follows:

Separation of Concerns: Divide your application into distinct features with as little overlap in functionality as possible. The important factor is minimization of interaction points to achieve high cohesion and low coupling…
(Microsoft, Software Architecture and Design 2009, Key Principles of Software Architecture)

Applying Separation of Concerns with what we have learnt about LINQ and EF I have separated the applications into Layers (see Multilayered architecture). In .NET these are separate projects/assemblies but are bound together into one application, i.e. direct method calls between layers. (Note: In today’s cloud system, such as Azure, scaling up is done by having the whole application running multiple times rather than the previous approach of having each layer running on a separate server). My layers are:

  1. Data Layer: sometimes called the Infrastructure Layer, or Persistence Layer.
    This deals with the reading and writing data to the database. It should provide IQueryable versions of each table/class for reading. Depending on the software design used it will either allow direct write/update of a database class, or if using Domain-Driven Design may impose some factories/methods that need to be called to accomplish this.
  2. Business Layer:
    This will contain all the business logic that goes beyond simple read/write of data. For instance in the AdventureWorks system a new order starts in the ‘Pending’ state and needs to be ‘Approved’ before it can be taken forward. That sort of logic should live in this Layer and it will directly interact with the Data Layer.
  3. Service Layer: sometimes called the Application layer.
    This section’s job is to format the data flowing from the Data Layer and the Business Layer to/from the format that the Presentation Layer It may need specialised handling for changing the data in the Data section if a Domain-Driven Design approach is used.
  4. Presentation Layer: sometimes called the UI (User Interface) Layer.
    This section’s job is to present the formatted data from the Service Layer to either a user via a web page, to another system via say a RESTful interface. This section will also handle any paging, filtering, sorting etc. as required.

As you can see I have apportioned different parts of the query handling to different layers, i.e. queries/updates can be passed up the system and modified as it moves through the layers.

Before I leave this section I should say this is one of the possible ways in which a system could be architected. I recommend the book ‘Microsoft .NET: Architecting Applications for the Enterprise’ for an excellent discussion of other approaches.

The Service Layer is key

My experience is that the Service Layer is one of the most useful layers because it totally separates the data model, i.e. the key data classes/tables that we have in the system, from the presentation layer.

Why is the service layer so useful – because in any reasonably complex application the data model should be defined by the business process/data and not by how the user wants to see the data. By adding a service layer to ‘adapt’ the data we keep the data model focused solely on the business needs and do not allow it to be contaminated by features that are only used for display.

Obviously we need another class more attuned to the Presentation Layer’s needs. These are often called Data Transfer Objects (DTOs). In my system the Service Layer methods do two things:

  1. They take on the work of reformatting the data in the Data Layer or Business Layer to/from a DTO, which holds the data in format that is suitable for the Presentation Layer.
  2. They call the correct commands in the Data Layer or Business Layer to execute certain commands. The most obvious are the Create, Update or Delete commands to the Data Layer to update data held in the database.

The only problem is – the Service Layer has a lot of really repetitive code, and boring to write! So I set about fixing that with a library called GenericServices.

GenericServices and how it solves certain problems

GenericServices uses C# Generics to provide a set of CRUD (Create, Read, Update and Delete) services to the database. The developer only needs to write a DTO which inherits from a GenericServices’ template DTO to gain access to GenericServices CRUD methods which includes intelligent copying of data layer classes to/from DTOs.

You might not want to use GenericServices but you might find it useful to understand how it solves certain problems when working with the AdventureWorksLt2012 database. Note: If you do want to look at GenericServices, which is an open-source project, then you can find out more about GenericServices on GitHub or via the SampleMvcWebApp site.

In the next four sections I describe various issues that I came across in my design and how I solved them in GenericServices.

1. Copying properties to/from Data Layer classes and DTOs using LINQ

I think I have made a case for using DTOs – they reduce the data required and allow data from various relationships to be pulled together into one place (called flattening). I am also saying that using LINQ commands to select the data means the EF can make a very efficient database access, only pulling in the specific cells needed for the Presentation Layer.

In my early projects I hand-coded the data to DTO and DTO to data code. This worked but was incredibly repetitive, boring and needed Unit Tests to check I hand not fallen asleep during the process. I was committed to automating this process, which I have achieved in GenericServices.

GenericServices used AutoMapper for the task of data<->DTO copying. AutoMapper is an excellent, conversion based mapper that can handle some complex mappings between two classes, i.e. the data class and the DTO. I had come across AutoMapper years ago, but had rejected it because I did not support LINQ commands. However now it does support LINQ projections, which means it converts the copy into a LINQ expression which EF then uses to make an efficient SQL query.

So, my solution was to put AutoMapper at the core of my data<->DTO copying process and I have not been disappointed. If you have a similar need to copy data between similar classes then you should look at the AutoMapper Wiki site and Jimmy Bogard’s AutoMapper blogs for newer information.

2. Adding new calculated properties for display or editing

The other issue you saw in the Customer List example at the start was that I needed to come up with some new, calculated values to use to help display the customer list. In early projects I simply hand-coded these using LINQ expressions. Getting a solution to this was a little harder.

Just to recap, from the earlier example the Customer list needs three calculated properties:

  1. HasBoughtBefore, which should be true if the customer has any sales orders
  2. FullName, which combines the parts of the name to make something sensible for the user.
  3. TotalAllOrders, which sums all the order values to give total.

I initially used custom mapping in AutoMapper, where I basically inserted my hand-coded LINQ expressions. This works but Jimmy Bogard’s AutoMapper blog put me onto a very elegant solution called DelegateDecompiler which turns calculated properties into LINQ expressions. That may not make sense so let me give an example.

If I added a calculated property to the EF data class as shown below then using it in a LINQ expression would throw an exception, as it cannot handle properties that have calculations in them.

public bool HasBoughtBefore { get { return SalesOrderHeaders.Any(); } }

What DelegateDecompiler does is allows you to mark such properties with a [Computed] attribute and in your query you add a call to .Decompile(). This then turns the calculated properties into a LINQ expression, which means EF can then execute that code.

So in my data layer I can add the following partial class to the Customer data class to add two of the three properties I need.

public partial class Customer : IModifiedEntity
{
    [Computed]
    public string FullName { get { return Title + " " + FirstName
                                   + " " + LastName + " " + Suffix; } }

    /// <summary>
    /// This is true if a 'Customer' has bought anything before
    /// </summary>
    [Computed]
    public bool HasBoughtBefore { get { return SalesOrderHeaders.Any(); } }
}

GenericServices notices that the class has some properties marked with the [Computed] attribute and invokes the DelegateDecompiler ‘s .Decompile() method before passing the query up to the Presentation Layer.

DelegateDecompiler is very elegant, but complicated library so some types of LINQ commands are not supported. In particular calculation like TotalAllOrders does not work, nor does it support properties that need to access a local variable. Therefore I have also provided an entry point in GenericServices to allow additional AutoMapper mapping to handle these complex cases. See the overridden AddedDatabaseToDtoMapping property in the GenericServices DTO definition.

public class ListCustomerDto : EfGenericDto<Customer, ListCustomerDto>
{

    // … other properties removed for clarity 

    /// <summary>
    /// This will hold the total value of all orders for this customer
    /// </summary>
    [Display(Name = "Total Sales Value")]
    [DataType(DataType.Currency)]
    public decimal TotalAllOrders { get; set; }

    // … other code removed for clarity

    protected override Action<IMappingExpression<Customer, ListCustomerDto>>
        AddedDatabaseToDtoMapping
    {
        get
        {
            return m => m.ForMember(d => d.TotalAllOrders,
                opt => opt.MapFrom(c =>
                       c.SalesOrderHeaders.Sum(
                           x => (decimal?)x.TotalDue) ?? 0));
        }
    }
}

Note: The LINQ calculation for TotalAllOrders is complex because it is summing a property in a collection relationship. Because a collection can be empty LINQ correctly insists on the nullable version of Sum. This returns null for empty collections, which I replace with zero.

Presentation Layer

Before I leave this example you might like to see the MVC controller code that displays the result. My design aim is to make the code in the MVC controller as simple and generic as possible. This will allow me to use scaffolding for building the controller code and the views. Below is the code for displaying a list.

public class CustomersController : Controller
{
    [DisplayName("Current Customers")]
    public ActionResult Index()
    {
        //the ‘true’ tells the Grid to only show rows where HasBoughtBefore is true
        return View(true);
    }

    public JsonResult IndexListRead(
        [DataSourceRequest]DataSourceRequest request,
        IListService service)
    {
        var result = service.GetAll<ListCustomerDto>()
                     .OrderBy( x => x.CustomerID).ToDataSourceResult(request);
        return Json(result, JsonRequestBehavior.AllowGet);
    }
    //… more create, update, delete method left out for clarity
}

Note the IListService, which is the GenericServices List service where we call GetAll<ListCustomerDto>(). to get the formatted DTO data. I then display this using a Kendo UI Pro Grid which takes an IQueryable query and adds its own filter, sort, group and paging controls to the query before executing the query. Note: Kendo UI Pro is a paid-for UI library for MVC and JavaScript. The result is a grid with paging, sorting, filtering etc.

Analysis of final SQL, and resulting changes

Finally I using Red Gates’ ANTS Performance Profiler to analyse the final SQL command (text file) that EF produced after all these different parts are combined. The display is for existing customers, i.e. HasBoughtBefore is true, and filters the results to only include customers whose company name containing “bike”.

I am no SQL expert but looking at the SQL command produced it seemed to me that sum of the SalesOrderHeader.TotalDue seemed to be done twice, once to see if it was null and then again if it wasn’t null. This didn’t seem too efficient so I changed the LINQ code to first check if there was any SalesOrderHeader before attempting the sum. The new LINQ code to define the TotalAllOrders property is shown below:

SalesOrderHeaders.Any() ? SalesOrderHeaders.Sum(x => x.TotalDue) : 0;

This produced the following SQL command (text file), which a) seems simpler and b) was faster that the original SQL command. As I say I am not a SQL expert but I would say this is an improvement.

In real life you should look at areas of poor performance and then apply increasing levels of refactoring, right up to using direct SQL commands, to obtain the performance you need. However this does take development time so in some projects a quickly written but possibly poor performing database access may be acceptable.

3. Handling composite keys

Quite a few of the tables in the AdventureWorks Lite database has composite keys, i.e. a primary key made up of two or more columns in the table. Now GenericServices is designed to handle composite keys and the Detail, Update and Delete services all take multiple keys. However there is one table in AdventureWorks Lite database where the keys can be a challenge: CustomerAddress. It is worth pointing out how to handle this in EF.

The picture below shows that the Customer has a collection of CustomerAddresses, which consists of a property called AddressType, which says if it’s their main office etc. and a link to a standard Address class. CustomerAddress has a composite key consisting of the CustomerID and the AddressID.

CsutomerAddress-table-with-relationships

So, the problem comes when creating a new CustomerAddress. We need to create the CustomerAddress and the Address in the same commit so that they are consistent, but we also need to get the new AddressID to fill in part of the composite key on the CustomerAddress.

The answer is that EF does this as part of its relational fixup. To get this to work I create the CustomerAddress and set its EF ‘Address’ relational property to a new Address type. We add the CustomerAddress to the database using db.CustomerAddresses.Add( myNew CustomerAddress) and call EF’s SaveChanges() and its done.

4. Deleting database classes with have relationships

When deleting items from the database that have relationships, there are a few issues we need to think about. Firstly, if something is ‘pointing to’ that item, in SQL terms it is referencing that key and this reference has been enforced by a foreign key constraint. Because of this constraint, you cannot delete the row if it is being referenced by another table. You need to remove the foreign keys that reference the key value for the row. In our case, we’d want to delete all rows that reference the row we wish to delete. You need a way of handling this in a graceful way so the user is not faced with a ‘There was an Error’ message.

In GenericServices I have used a layered approach. You can add your own code to check before you delete, but by default the GenericServices Delete method relies on a special .SaveChangedWithValidation() method which catches certain SQLExceptions. In this case the SQL error code is 547, which is a constraint error triggered on an attempt to delete a row that another row is relying on. I describe this approach in a previous Simple-Talk article called, Catching Bad Data in Entity Framework.

5. Properly deleting rows any their relationships

The other delete issue is the opposite of the previous one. What happens if I want to delete a database row that has a relationship to other database entries? In cases where the relationship is only between these two items then the lower relationship is often set as ‘cascade on delete’, so deleting the main row will cause its child relationship to be deleted. However there can be complex cases where you have to handle this yourself. That is true of the CustomerAddress and its Address listed in the last section.

When I delete CustomerAddress I most likely want to delete the Address entry that it points to. I say ‘most likely’ as it is quite possible that the Address part was used in an old Sales Order as the delivery or billing address. Because of this dual use there is no cascade delete on the Address so we need to do this by hand. In GenericServices I added a DeleteWithRelationships(Func<>) method which allows a small helper Func<> to be written by the developer that looks after any relationship deletion within the same SQL commit as the primary delete. That way if the delete of the Address entry fails because it is using in a delivery or billing address then the delete of the CustomerAddress is rolled back. The system also provides a rather generic, but friendly message.

 

Any down sides to this approach?

It is always good to think of what could be done better – that is how I got to write GenericServices. Looking at the approach outline above I would make these comments:

  1. The proposed architecture is very reliant on LINQ and its conversion to database commands. I don’t think this is a major problem, but it does limit you to providers that handle LINQ. Here are some providers.
    1. Entity Framework (EF) is acknowledged as having the best LINQ-to-SQL conversion. EF currently only supports Microsoft SQL Server, although the EF v7 roadmap includes support for other database types.
    2. NHibernate, MySql and other database access providers support LINQ-to-database converter and supports a lot of database types. However I cannot vouch for the depth of their implementation.
    3. You can write your own LINQ data source provider already, and it is said to become easier in EF v7, which is out ‘soon’.
  2. LINQ/IQueryable is not easily serializable, so a system that has separated systems talking over some message bus would not work. However current scaling practice, see Azure’s How to Scale an Application, is to run multiple instances of the same application which works for this approach.
  3. Some people say that passing IQueryable right up to the Presentation Layer is an anti-pattern. The main concern is that used wrongly a developer can add a .ToList() to the end of a query and load thousands of rows of data. That is a concern, which is why I make the Presentation layer calls very simple and standard. My long term aim is to scaffold most of Presentation Layer, but I need to do more projects before I really know what is needed.
  4. Similar to point 2 but in untrained hands software developers can write really bad queries that hit performance issues. This is true of nearly every ORM (Object-Relational Mapping) software system, but EF is big and it has taken me years to get really understand what EF is doing under the hood. However the fact is the software development part of a project is likely to be the biggest cost and a good ORM, plus training, is likely to reduce timescales.

Finally I should say that I describe a Layered Architecture but for really big systems this may not scale enough. However the overall approach of using LINQ and separating different parts of the processing of the query would be valid in other architectures like CQRS (Command and Query Responsibility Segregation) architecture or within each service in a MicroService architecture. I would recommend looking at the pdf of an excellent book called ‘Microsoft .NET: Architecting Applications for the Enterprise’ for a detailed discussion of other approaches.

Conclusions

Getting EF and GenericServices to work with the AdventureWorks Lite database was very revealing. Some things worked really well first time and some needed quite a bit of fiddling around. As a result of this I changed the implementation of GenericServices to better handle these more complex cases.

Overall it took about two weeks to build a web application with included List, Show Detail, Create, Update, and Delete of Customers and Orders + adding/editing line items in an order plus Product listing and lookup. When compared to hand-coding a web application like that all that then that I think is very fast, so I am pleased.

I hope you have found this article useful in describing an approach to building complex, database driven web applications. Hopefully it has suggested some approaches that may well save you time in the future.

Happy coding!

Entity Framework: working with an existing database

Last Updated: December 9, 2014 | Created: November 1, 2014
Quick Summary
This two-part article looks at the software issues raised when presented with a database that was designed and built prior to software development. Part 1 (this article) explores the low-level problems of connecting to that database using Entity Framework . Part 2 looks at the high-level problems of transforming the EF data into a form suitable for display via an ASP.NET MVC web site.

Part 1: Using Entity Framework with an existing database

In many large-scale projects software developers are often have to work with existing SQL Server databases with predefined tables and relationships. The problem can be that some predefined databases can have aspects that are not so easy to deal with from the software side.

This is a two part article following my experiences of putting together a web application using the AdventureWorksLT2012 database, which is a cut-down version of the larger AdventureWorks OLTP database. I am using Microsoft’s ASP.NET MVC5 (MVC) with the propriety Kendo UI (KendoUI) package for the UI/presentation layer. The two parts to the article are:

  1. PART 1 (this article). The best way to get Microsoft’s Entity Framework version 6 (EF) data access tool to import and access the database.
  2. PART 2: (this link) This looks at the higher level functions is needed to transform that data into a user-focused display. In this case using the GenericServices open-source library to build a service/application layer to connect to modern web front end.

Let us start on…

Using Entity Framework with an existing database

Entity Framework 6 has a number of features to make working with existing databases fairly straightforward. In this article I detail the steps I needed to take on the EF side to build a fully featured web application to work with the AdventureWorks database. At the end I also mention some other techniques that I didn’t need for AdventureWorks, but I have needed on other databases. The aim is to show how you can use EF with pre-existing databases, including ones that need direct access to T-SQL commands and/or Stored Procedures.

1. Creating the Entity Framework Classes from the existing database

Entity Framework has a well documented approach, called reverse engineering, to create the EF Entity Classes and DbContext from an existing database which you can read here. This produces data classes with various Data Annotations to set some of the properties, such as string length and nullablity (see below), plus a DbContext with an OnModelCreating method to set up the various relationships.

namespace DataLayer.GeneratedEf
{
    [Table("SalesLT.Customer")]
    public partial class Customer
    {
        public Customer()
        {
            CustomerAddresses = new HashSet<CustomerAddress>();
            SalesOrderHeaders = new HashSet<SalesOrderHeader>();
        }

        public int CustomerID { get; set; }

        public bool NameStyle { get; set; }

        [StringLength(8)]
        public string Title { get; set; }

        [Required]
        [StringLength(50)]
        public string FirstName { get; set; }

        [StringLength(50)]
        public string MiddleName { get; set; }

        //more properties left out to shorten the class...
        //Now the relationships

        public virtual ICollection<CustomerAddress> CustomerAddresses { get; set; }

        public virtual ICollection<SalesOrderHeader> SalesOrderHeaders { get; set; }
}

This does a good job of building the classes. Certainly having the Data Annotations is very useful as front-end systems like MVC use these for data validation during input. However I did have a couple of problems:

  1. The default code generation template includes the `virtual` keyword on all of the relationships. This enabled lazy loading, which I do not want. (see section 1.1 below)
  2. The table SalesOrderDetail has two keys: one is the SalesOrderHeaderID and one is an identity, SalesOrderDetailID. EF failed on a create and I needed to fix this. (See section 1.2 below)

I will now describe how I fixed these issues.

1.1. Removing lazy loading by altering the scaffolding of the EF classes/DbContext

As I said earlier the standard templates enable ‘lazy loading’. I have been corrected in my understanding of lazy loading by some readers. The documentation states that ‘Lazy loading is the process whereby an entity or collection of entities is automatically loaded from the database the first time that a property referring to the entity/entities is accessed’. The problem with this is it does not make for efficient SQL commands, as individual SQL SELECT commands are raised for each access to virtual relationships, which is not such as good idea for performance.

As you will see in part 2 of this article I use a technique that ‘shapes’ the sql read to only loads the individual properties or relationships I need. I therefore do not need, or want, lazy loading of relationships.

Now you could hand edit each generated class to remove the ‘virtual’, but what happens if (when!) the database changes? You would then reimport the database and lose all your edits, which you our your colleague might have forgotten about by then and suddenly your whole web application slows down. No, the common rule with generated code is not to edit it. In this case the answer is to change the code generated during the creating of the classes and DbContext. This leads me on to…

Note: You can turn off lazy loading via the EF Configuration class too, but I prefer to remove the virtual keyword as it ensures that lazy loading is definitely off.

The generation of the EF classes and DbContext is done using some t4 templates, referred to as scaffolding. Be default it uses some internal scaffolding, but you can import the scaffolding and change it. There is a very clear explanation of how to import the scaffolding using NuGet here, so I’m not going to repeat it.

Once you have installed the EntityFramework.CodeTemplates you will find two files called Content.cs.t4 and EntityType.cs.t4, which control how the DbContext and each entity class are build respectively. Even if you aren’t familiar with t4 (a great tool) then you can understand what it does – its a code generator and anything not surround by <# #> is standard text. I found the word virtual in the EntityType.cs.t4 and deleted it. I also removed virtual from the Content.cs.t4 file on the declaration of the DbSet<>.

You may want to alter the scaffolding more extensively, say to add a [Key] attribute on primary keys for some reason. All is possible, but you must dig into the .t4 code in more depth.

One warning about using importing scaffolding – Visual Studio threw a nasty error message when first tried to import using the EntityFramework.CodeTemplates scaffolding (see stackoverflow entry). It took a bit of finding but it turns out if you have Entity Framework Power Tools Beta 4 installed then they clash. If you have Entity Framework Power Tools installed then you need to disable it and restart Visual Studio before you can import/reverse engineer a database. I hope that gets fixed as Entity Framework Power Tools is very useful.

Note: There are two other methods to reverse engineer an existing database:

  1. EntityFramework Reverse POCO Code First Generator by Simon Hughes. This is Visual Studio extension recommended by the EF Guru, Julia Lerman, in one of her MSDN magazine articles. I haven’t tried it, but if Julia recommends it then its god.
  2. Entity Framework Power Tools Beta 4 can also reverse engineer a database. Its quicker, only two clicks, but its less controllable. I don’t suggest you use this.

1.2 Fixing a problem with how the two keys are defined in SalesOrderDetail table

The standard definition for the SalesOrderDetail table key parts are as followes

[Table("SalesLT.SalesOrderDetail")]
public partial class SalesOrderDetail
{
    [Key]
    [Column(Order = 0)]
    [DatabaseGenerated(DatabaseGeneratedOption.None)]
    public int SalesOrderID { get; set; }

    [Key]
    [Column(Order = 1)]
    public int SalesOrderDetailID { get; set; }

    //other properties left out for clarity...
}

You can see it marks the first as not database generated, but it does not mark the second as an Identity key. This caused problems when I tried to create a new SalesOrderDetail so that I could add a line item to an order. I got the SQL error:

Cannot insert explicit value for identity column in table ‘SalesOrderDetail’ when IDENTITY_INSERT is set to OFF.

That confused me for a bit, as other two-key items had worked, such as CustomerAddress. I tried a few things but as it looked like an EF error I tried telling EF that the SaledOrderDetailID was a Identity key by using the attribute [DatabaseGenerated(DatabaseGeneratedOption.Identity)]. That fixed it!

The best solution would be to edited the scaffolding again to always add that attribute to identity keys. That needed a bit of work and the demo was two days away so in the meantime I added the needed attribute using the MetadataType attribute and a ‘buddy’ class. This is a generally useful feature so I use this example to show you how to do this in the next section.

3. Adding new DataAnnotations to EF Generated classes

Being able to add attributes to properties in already generated classes is a generally useful thing to do. I needed it to fix the key problem (see 1.2 above), but you might want to add some DataAnnotations to help the UI/presentation layer such as marking properties with their datatype, e.g. [DataType(DataType.Date)]. The process for doing this is given in the Example section of this link to the MetadataType attribute. I will show you my example of adding  the missing Identity attribute.

The process requires me to add a partial class in another file (see section 3 coming later for more on this) and then add the [MetadataType(typeof(SalesOrderDetailMetaData))] attribute to the property SaledOrderDetailID in a new class, sometimes called a ‘buddy’ class . See below:

[MetadataType(typeof(SalesOrderDetailMetaData))]
public partial class SalesOrderDetail : IModifiedEntity
{
}

public class SalesOrderDetailMetaData
{
    [DatabaseGenerated(DatabaseGeneratedOption.Identity)]
    public int SalesOrderDetailID { get; set; }
}

The effect is to apply those attributes to the existing properties. That fixed my problem with EF creating new SalesOrderDetail properly and I was away.

2. What happens when the database changes?

Having sorted  the scaffolding as discussed above then just repeat step 1, ‘Creating the Entity Framework Classes from the existing database’. There are a few things you need to do before, during and after the reimport.

  1. You should remember/copy the name of the DbContext so you use the same name when you reimport. That way it will recompile properly without major name changes.
  2. Because you are using the same name as the existing DbContext you must delete the previous DbContext otherwise the reimporting process will fails. If its easier you can delete all the generated files as they are replaced anyway. That is why I suggest you put them in a separate directory with no other files added.
  3. When reimporting by default the process will add the connection string to your App.Config file again. I suggest you un-tick that otherwise you end up with lots of connection strings (minor point, but can be confusing).
  4. If you use source control (I really recommend you do) then a quick compare of the files to check what has changed is worthwhile.

3. Adding new properties or methods to the Entity classes

In my case I wanted to add some more properties or methods to the class? Clearly I can’t add properties that change the database – I would have to talk to the DBA to change the database definition and import the new database schema again. However in my case I wanted to add properties that accessed existing database properties to produce more useful output, or to have an intention revealing name, like HasSalesOrder.

You can do this because the scaffolding produces ‘partial’ classes, which means I can have another file which adds to that class. To do this it must:

  1. Have the same namespace as the generated classes
  2. The class is declared as public partial <same class name>.

I recommend you put them in a different folder to the generated files. That way they will not be overwritten by accident when you recreate the generated files (note: the namespace must be the original namespace, not that of the new folder). Below I give an example where I added  to the customer class. Ignore for now the IModifiedEntity interface (dealt with later in this article) and [Computed] attribute (dealt with the the part 2 of the article).

public partial class Customer : IModifiedEntity
{
    [Computed]
    public string FullName { get { return Title + " " + FirstName + " " + LastName + " " + Suffix; } }

    /// <summary>
    /// This is true if any sales orders. We use this to decide if a 'Customer' has actually bought anything
    /// </summary>
    [Computed]
    public bool HasSalesOrders { get { return SalesOrderHeaders.Any(); } }
}

Note that you almost certainly will want to add to the DbContext class (I did – see section 4 below). This is also defined as a partial class so you can use the same approach. Which leads me on to…

4. Dealing with properties best dealt with at the Data Layer

In the AdventureWorks database there are two properties called ‘ModifiedDate’ and ‘rowguid’. ModifiedDate needs to be updated on create or update and the rowguid needs to be added on create.

Many databases have properties like this and they are best dealt with at Data/Infrastructure layer. With EF this can be done by providing a partial class and overriding the SaveChanges() method to handle the specific issues your database needs. In the case of AdventureWorks I adding an IModifiedEntity interface to each partial class that has ModifiedDate and rowguid property.

Then I added the code below to the AdventureWorksLt2012 DbContext to provide the functionality required by this database.

public partial class AdventureWorksLt2012 : IGenericServicesDbContext
{
    /// <summary>
    /// This has been overridden to handle ModifiedDate and rowguid
    /// </summary>
    /// <returns></returns>
    public override int SaveChanges()
    {
        HandleChangeTracking();
        return base.SaveChanges();
    }

    /// <summary>
    /// This handles going through all the entities that have
    /// changed and seeing if we need to do anything.
    /// </summary>
    private void HandleChangeTracking()
    {
        foreach (var entity in ChangeTracker.Entries()
           .Where(e => e.State == EntityState.Added
               || e.State == EntityState.Modified))
        {
            UpdateTrackedEntity(entity);
        }
    }

    /// <summary>
    /// Looks at everything that has changed and
    /// applies any further action if required.
    /// </summary>
    /// <param name="entityEntry"></param>
    /// <returns></returns>
    private static void UpdateTrackedEntity(DbEntityEntry entityEntry)
    {
        var trackUpdateClass = entityEntry.Entity as IModifiedEntity;
        if (trackUpdateClass == null) return;
        trackUpdateClass.ModifiedDate = DateTime.UtcNow;
        if (entityEntry.State == EntityState.Added)
            trackUpdateClass.rowguid = Guid.NewGuid();
    }
}

The IModifiedEntity interface is really simple:

//This interface is added to all the database entities
//that have a modified date and rowGuid. Save Changes uses this
// to find entities that need the date updating, or a new rowguid added
public interface IModifiedEntity
{
    DateTime ModifiedDate { get; set; }
    Guid rowguid { get; set; }
}

5. Using SQL Store Procedures

Some databases rely on SQL Stored Procedures (SPs) for insert, update and delete of rows in a table. AdventureWorksLT2012 did not, but if you need to that EF 6 has added a neat way of linking to stored procedures. Its not trivial, but you can find good information here on how to do it.

Clearly if the database needs SPs for CUD (Create, Update and Delete) actions then you need to use them. However using EFs CUD actions is easier from the software point of view, and EFs CUD have some nice features. For instance EF has an in-memory copy of the original values and uses this for working out what has changed.

The benefit is that the EF updates are efficient – you update one property and only that cell in a row is updated. The more subtle benefit is tracking changes and handling SQL security, i.e. if you use SQL column level security (Grant/Deny) then if that property is unchanged we do not trigger a security breach. This is a bit of an esoteric feature, but I have used it and it works well.

6. Other things you could do

This is all I had to do to get EF to work with an existing database, but there are other things I have had to use in the past. Here is a quick run through of other items:

6a. Using Direct SQL commands

Sometimes it makes sense to bypass EF and use a SQL command, and EF has all the commands to allow you to do this. The EF documentation has a page on this here which gives a reasonable overview, but I recommend Julia Lerman’s book  ‘Programming Entity Framework: DbContext’ which goes into this in more detail (note: this book is very useful but it covers an earlier version of EF so misses some of the latest commands like the use of SPs in Insert, Update and Delete).

For certain types of reads SQL makes a lot of sense. For instance in my GenericSecurity library I need to read the current sql security setup (see below). I think you will agree it makes a lot of sense to do this with a direct sql read rather than defining multiple data classes just to build the command.

var allUsers = db.Database.SqlQuery<SqlUserAndRoleRow>(
@"select mp.name as UserName, rp.name as RoleName, mp.type as UserType
from sys.database_role_members drm
join sys.database_principals rp on (drm.role_principal_id = rp.principal_id)
join sys.database_principals mp on (drm.member_principal_id = mp.principal_id)
ORDER BY UserName");

For SQL commands such as create, update and delete is is less obvious, but I have used it in some cases. For these you use the SqlCommand method, see example from Microsoft below:

using (var context = new BloggingContext()) 
{ 
    context.Database.SqlCommand( 
        "UPDATE dbo.Blogs SET Name = 'Another Name' WHERE BlogId = 1"); 
}

Neither of these example had parameters, but if you did need any parameters then SqlQuery and SqlCommand methods can take parameters, which are checked to protect against a SQL injection attack. The Database.SqlQuery Method documentation shows this.

One warning on SqlCommands. Once you have run a SqlCommand then EF’s view of the database, some of which is held in memory, is out of date. If you are going to close/dispose of the DbContext straight away then that isn’t a problem. However if the command is followed by other EF accesses, read or write, then you should use the EF ‘Reload’ command to get EF back in track. See my stackoverflow answer here for more on this.

6b. SQL Transaction control

When using EF to do any database updates using the .SaveChanged() function then all the changes are done in one transaction, i.e. if one fails then none of the updates are committed. However if you are using raw SQL updates, or a combination of EF and SQL updates, you may well need these to be done in one transaction. Thankfully EF version 6 introduced commands to allow you to control transactions.

I used these commands in my EF code to work with SQL security. I wanted to execute a set of SQL commands to set up SQL Security roles and grant/deny access, but if any one failed I wanted to roll back. The code to execute a sequence of sql commands and rollback if any single command fails is given below:

using (var dbContextTransaction = db.Database.BeginTransaction())
{
    try
    {
        foreach (var text in sqlCommands)
        {
            db.Database.ExecuteSqlCommand(text);
        }
        dbContextTransaction.Commit();
    }
    catch (Exception ex)
    {
        dbContextTransaction.Rollback();
        //report the error in some way
    }
}

You can also use the same commands in a mixed SQL commands and EF commands. See this EF documentation for an example of that.

Conclusion

There were a few issues to sort out but all of them were fixable. Overall getting EF to work with an existing database was fairly straightforward, once you know how. The problem I had with multiple keys (see section 1.2) was nasty, but now I, and you, know about it we can handle it in the future.

I think the AdventureWorks Lite database is complex enough to be a challenge: with lots of relationships, composite primary keys, computed columns, nullable properties etc. Therefore getting EF to work with AdventureWorks is a good test of EFs capability to work with existing SQL databases. While the AdventureWorks Lite database did not need any raw SQL queries or Stored Procedures other projects of mine have used these, and I have mentioned some of these features at the end of the article to complete the picture.

In fact version 6 of EF added a significance amount of extra features and commands to make mixed EF/SQL access very possible. The more I dig into things the more goodies I find in EF 6. For instance EF 6 brought in Retry logic for Azure, Handling transaction commit failures, SQL transaction control, improved sharing connections between SQL and EF, plus a number of other things. Have a good look around the EF documentation – there is a lot there.

So, no need to hold back on using Entity Framework on your next project that has to work with an existing SQL database. You can use it in a major role as I did, or now you have good connection sharing just use it for the simple CRUD cases that do not need heavy T-SQL methods. Happy coding.

Now read Part 2, which looking at data to user interface side

Is the Repository pattern useful with Entity Framework? – part 2

Last Updated: February 21, 2018 | Created: September 19, 2014
Quick Summary of Part 2
Having decided that the Repository/Unit of Work pattern does not add anything to Entity Framework (EF) the question was how to tame EF to work with Domain-Driven Design principals. This post details my new approach to using EF and why it is better.

This is a series:

  1. Part 1: Analysing whether Repository pattern useful with Entity Framework
  2. Part 2: Four months on – my solution to replacing the Repository pattern (this article).
  3. UPDATE (2018): Big re-write to take into account Entity Framework Core, and further learning.

Four months ago I was reviewing the design of Spatial Modeller™ , version one and I come across people who said that the Repository/Unit of Work pattern was not useful with Microsoft’s main data access technology, Entity Framework (shortened to EF from now on).

See items marked with ‘UPDATED 2015‘ for new comments added eight month on from first article.

And ‘UPDATED 2017′ for new comments added from my book Entity Framework Core in Action, which I am writing for Manning Publishing.

At that time I wrote a the blog post called ‘Is the Repository pattern useful with Entity Framework? – part 1‘ where I looked critically about what people said. At the time I was fairly convinced, but I needed to write some real code to see how I would do things without a repository.

Well, four months on the answer is – I’m convinced. The Repository/Unit or Work pattern does NOT add anything to EF. In fact it hides some useful features that a repository has trouble using. (UPDATE 2015: eight months on I am even more convinced as I have now built some more representative applications).

However the issue is – what replaces the good bits of the repository, like handling Domain-Driven Design (DDD) which tries to treat the data persistence library as a simple in-memory collection. This means we want to hide some of EFs code, but still be able to use all of its features. It has taken me a little while to refine my approach, but here are my thoughts so far.

One size does not fit all

After some trials it became clear that three parts of the application needed to interact with the database:
  1. The User Interface (UI) which needs to display data in lots of different ways and often needs help when editing data, i.e. it might have to provide the user with a list of options using a dropdown list.
  2. The WebApi or Web Service, which is similar to the User Interface but has less need for user selection from a list.
  3. The Business Layer which needs to hide some of the EF code and make the database feel like a searchable collection.
Realising that there were different parts was a breakthrough as I could now design different ways to interface to EF to meet the different requirements. I combined the UI and the WebApi as they are very similar and then provided a different approach for the business layer.

Design 1: CRUD data access for User Interface layer and WebApi

The front-end of any application needs to do CRUD operations: that is Create, Read, Update and Delete of entries. Create and Update can be particularly difficult as it might need other data to guide the user. For instance on my example site the create/update of a Post needs the user to select the right Tags and, in my example, the Author (try http://samplemvcwebapp.net/Posts and then edit an entry). That needs extra data, which is where Data Transfer Objects (DTOs) come in.

I am a strong proponent of DTOs I have used them heavily and I find they really help decouple the data and business layer from the UI or WebApi. DTOs are also good at dealing with the issue of Aggregates, that is a class that has subclasses under it. For instance you might have a class called Order, which has collection of LineItems under it, and most likely a link to a Customer. DTOs can extract all the various parts and either ‘flatten’ them into one class, or for collections extract only the parts of the data that the user wants/is allowed to see.

The important point is to design the transfer for reads from the database so that they turn into efficient SQL requests. This needs a bit of thought, but isn’t that hard to do. By using LINQ and IQueryable<T> requests then this allows paging and filtering to be added at the last minute in the UI/WebApi end. Also adding the .Select function allows only the columns/properties you need to be read in. The benefit of all this is that the SQL request only asks for those rows and properties that the request needs, which makes the request as efficient as possible.

So an approach that uses DTO is good. The only down side is that writing the data access code for DTOs is repetitive and boring. I therefore looked at how to automate this. As I said in part 1 of this post I thought I might use T4 code generation. However in the end I pursued a route using .NET Generics. The result is an Open Source library called GenericServices which provides List, Detail, Create, Update and Delete functions on either the EF data classes themselves or more interestingly via DTOs linked to EF data classes.

Design 1: Implementation (UPDATED 2015)

Rather than describe how this all works here I have build two example web sites.

1. SampleMvcWebApp.net, which is simple application to show GenericServices in action.

2. (UPDATE 2015) Complex.SampleMvcWebApp, which is a much more realistic example of a real application. The interesting thing for me is that it only took me two weeks to write the whole application (well, plus a few days for adding supporting comments and a few tweaks to improve it). That is way faster than my previous applications and proved to me the power of GenericServices.

I have written two articles that describe the design of the Complex.SampleMvcWebApp site. They are:

(UPDATE 2015) The GenericServices Wiki now contains lots of information  and GenericService is now available on Nuget – see GenericServices on NuGet. This means you can try GenericServices yourself.


Design 2: The Business Layer (UPDATED 2015)

UPDATE: I now have a improved approach to handling business logic which you can find in the article Architecture of Business Layer working with Entity Framework (Core and v6) – revisited and in Chapter 4 of the book I am writing for Manning Publishing, Entity Framework Core in Action.

For the business layer I have taken a different approach. From a design/architecture point of view I do three things.

  1. I allow the business layer to directly access the data via Entity Framework (EF). No repository, not unit of work, just good old EF. This means that I have access to all of EFs commands and nothing is hidden from me.
  2. Problem is that EF on its own can be a bit of a handful so I tame it with ‘Query Objects’. Mainly I want to make repetitive, or complicated EF code appear as simple methods with intention revealing names. See implementation section for an example of this.
  3. In the business layer I try to treat the database as an in-memory collection which some search procedures, i.e. Query Objects. I try to not use the data access methods other than for loading or inserting new entries. Any other commands are hidden in Query Objects or other methods.

Let me explain how this works by talking about the implementation.

Design 2: Implementation

So the implementation of using EF in the business layers consists of four parts:

1. Building ‘Query Objects’ to hide some of the complex filtering/loading

UPDATE 2017: In chapter 2 of the book Entity Framework Core in Action I describe building a book listing with sorting, filtering and paging using a series query objects. You can see a live example site showing you those query objects in action.

In the first post I mentioned that people were taking about Query Objects, which is an encapsulation of particular database queries or updates. In C# Extension methods are a great tool for building Query Objects.  If you have seen LINQ fluent commands then that is how extension methods work. Below is an example from Spatial Modeller™ where it loads all the schemes who’s key is in the array parameter, with the schemes layers, members, attributes and services.

public static IQueryable<SmScheme> GetSchemesByIdIncludingAllChildClasses
    ( this SpatialModellerDb db, ICollection<int> schemeIdsToLoad)
{
    return db.SmSchemes
        .Include(x => x.Grades)
        .Include(
            x => x.Layers.Select(y => y.Members
                              .Select(z => z.Attributes)))
        .Include(
            x =>
                x.Layers.Select(y => y.Members
                               .Select(z => z.ServicesHandled)))
        .Where(
            x => schemeIdsToLoad.Contains(x.SmSchemeId));
}

This code hides the EF code needed to perform the database action inside a nicely named method. This code would be in the data/infrastructure layer, which means the EF code is more centralised in one place. I would recommend ‘s article ‘Giving Clarity to LINQ Queries by Extending Expressions‘ which has some excellent ideas and source code for building ‘query objects’.

2. Using Domain-Driven Design (DDD) principals

I really like the thinking behind DDD. It has made me a better programmer. While DDD accepts that the way data is stored/persisted will have an affect on the design (see Eric Evans book, page 159) the aim is to allow Business Layer code treat the database as an in-memory collection which some search procedures, with as little interaction as possible with the data access code.

This isn’t done just to hide the database, but mainly to ensure that the developers working in the business layer are thinking about the business problem first, not how the database works. DDD recognises that there might need to be some implementation compromises to work with the database, but the overall aim is to keep the focus on the business problems.

The other issue that effects the approach used is that the actual data saved in the database will be designed for the business layer. Therefore the business layer will use the data classes directly, as they are a good fit to the business problem.

Update after a lot more business layer work

I still use a DDD approach, but I have found that EF doesn’t really support a pure DDD design for a number of reasons. Firstly the EF database classes need to be at the data layer level, so these classes aren’t really the right place to have business logic (I have a separate layer/assembly for that). Also EF sets certain restrictions on the way collections are handled, which means you can’t use a IReadonlyCollection, which can make building a DDD root/aggregates approach where the sub classes of a DDD root can only be changed via the root hard to enforce. Because of this I still follow a DDD design approach, but some of the more stringent access rules have to be implemented simply by setting standards, not by programatically enforces rules.

Even so my business layer is able use the DDD principal of treating the database as just classes, which make the business logic much simpler. To do this a typical business pattern has three phases:

  1. Read in any data it needs for the business logic.
  2. Some pure business logic working on the data classes – the bulk of the code.
  3. An optional add/remove phase if any new data needs to be added to the database (note: updates happen automatically if you just change the database classes read in in phase 1).
3. The Business Layer never calls EF’s .SaveChanges() to persist data

The business layer should not know about persisting/updating data. I therefore design the business layer call .add() or .remove() on the EF collections, but the final update is triggered by the Service Layer after the business process returns.

This sounds subtle, but it has proved to be really useful. I go into this in more detail in my new article  Architecture of Business Layer working with Entity Framework but here are three good reasons why it helps.

  • The single big commit meant that the modelled data was either there or not – no cleaning up if the business layer had done part of the job and committed it but a later part failed.
  • The Business Layer does not have to handle database validation errors, which can occur when ‘SaveChanges’ is called. The Service Layer does that, which is better placed to deal with any errors.
  • The business layer was easier to test as it added data to the EF collections, but didn’t change the database.

Conclusion (UPDATED 2015)

Doing all this is hard work, especially building the GenericServices library, but I really do think this will speed up future development. I built the SampleMvcWebApp fairly quickly, but that was because I was developing GenericServices.

However when I built Complex.SampleMvcWebApp, which is much more representative application, it only took two weeks, which I though was fantastic. Have a look at the site – it’s fairly complex in terms of data manipulation even if I didn’t spend time on the visual design. NOTE: I have not implemented any business logic at all in the sample. If I had it would have taken a lot longer because business logic is by its very nature specific to a project.

The things I immediately noticed when swapping to the new approach are:

  • The separation of concerns is much better, with the EF code more focused to where it is needed.
  • Creating a good repository pattern is hard and I used to find myself fiddling with the repository implementation during the project as I hit new issues. That is all gone.
  • Using EF directly allows me to use the full range of commands, like load to load relationships that are not currently loaded. I now realise when I used a repository pattern I was having to tweak the code to get round things that EF had commands to solve.

The primary helper is GenericService, which as of 2015 is now released on Nuget – see GenericServices on NuGet. Without GenericService it would be a hard slog to build the Service Layer. What I noticed GenericServices did was:

  • Writing the DTOs is all about getting the right names/properties. Everything else is handled by GenericServices. So much quicker to write and more robust.
  • If there was a special case then it was easy to override one of GenericServiceDTOs virtual methods. As part of the exercise of building Complex.SampleMvcWebApp I made the method names more obvious to stop me having to look them up every time.
  • The MVC Controller code is now the same for all but the most complex situation, with just the data classes changing. This makes writing the MVC side much easier, and less error prone. I plan to create some scaffolding but currently waiting for MVC 6 (vNext) which has new scaffolding options.

I have also written a private library called GenericActions to help with calling business methods, but that is a subject of another article.

I hope that is helpful to people. Happy coding!

Analysis of Entity Framework 6 async performance

Last Updated: July 29, 2014 | Created: July 17, 2014
Quick Summary
This post is an appendix to an article I am writing for the Simple-Talk blog site. It contains the detailed results of side-by-side comparisons of sync and async versions of the  Entity Framework (EF) 6 data accesses. This shows that async EF commands are not that much  slower than normal sync commands.

I am writing a article called ‘The .NET 4.5 async/await Commands in Promise and Practice‘ for the Simple-Talk blog. In this I look at the whole area of the async/await commands for tasking.

In putting this article together I ran extensive side-by-side comparisons of normal, synchronous Entity Framework (EF) commands and the new async versions of the same commands. I found some surprising results which I think others might be interested in.

The Simple-Talk article wasn’t the right place to put all this technical detail so I have written this blog as an appendix for those of you that would like to drill-down into the detail. I think you will find the results interesting.

Overview of the test database

tagpostblog
The three classes used in database (click to enlarge).

I am building a .NET library called GenericServices (see previous blog post on this) and as the test database I have a simple blog site with three classes in it: Blog, Post and Tag. The diagram below shows the data classes and their relationships.

Most of the tests are done on the Post class, which is the more complex to update and deleting it does not cause cascade deletes. Also, for update I assume a disconnected update, i.e. as would happen on a web site, where the original Post is read and then specific properties are updated before they are written back to the database.

Test 1. Raw EF performance

The first test results I will share with you is the raw speed of the sync and async version of the EF commands. This was done inside an NUnit test running  inside Visual Studio 2013 on my development PC which has an Intel i7 processor clocked at 3.31GHz. The database is localDb running SQL server 2012. I ran 1000 tests on a database filled with 1000 entres and averaged out the time for each command.

You can see the unit test code here (I used method Perf21NCompareEfSyncAndAsync1000Ok around line 61) and the actual Ef commands here (note: GenericServices is an open-source project).

Let me list the results first and then draw some conclusions from this.

Sync (ms) Async (ms) Diff Notes
List, Direct 2.80 7.80 279% Just reads Post class
List, DTO 16.80 21.00 125% Reads Post, Blog and Tags
Create 10.40 8.80 85% Reads Tags and write Post
Update 15.70 9.70 62% Reads Post, Tags and write Post
Delete 0.90 1.10 122% Simple state update

Analysis of raw EF performance

  1. Async does not have much of an overhead
    I was surprised that the async versions were so fast. Lots of blog posts warn about async being slow, but on the simplest method, which was listing the Post class it was only 5ms slower. That, I think is impressive. However in the unit tests the context it was saving was small (see my simple-talk article to learn more about context) and it also caches the context so one off commands might take longer. I look at single reads later in the real-world section below.
  2. Some of the async commands are faster!?
    You can see that the async version of create and the update are faster (see blue percentages). Why is that? I am not totally sure, but I think it is because there are multiple database accesses (see the notes column) and I think it manages to overlap these. If someone has a better explanation then please let me know.

Test 2. Real-world performance on ASP.NET MVC5 site

Raw figures like that are interesting, but its real-world performance that matters. I have a companion MVC5 web site called SampleMvcWebApp, also open-source, which I try things out on. This has the same Blog, Post, Tag format described at the start, but with only a few entries (default 17). I have this MVC web app on different hosting environments, plus internally:

Try this yourself!
The SampleMvcWebApp MVC5 web site is live and available for you to access and play with. Its address is http://samplemvcwebapp.net/
  1. A low-cost shared site running Windows Server 2012 R2 through a UK company called WebWiz. I would recommend WebWiz (no, they are not paying me) as they seem to support the latest Microsoft environments quickly, (one of the first to support SQL 2012 in the UK) and they are relatively cheap.
  2. A Windows Azure account where I can change the level of hosting performance.
  3. Locally on my development server.

My tests are done using ab, the Apache HTTP server benchmarking tool. This allows me to load the site and also get averaged performance figures. I used the benchmark tool to read the posts list (17 entries) in two modes: a) one user trying 100 times, b) 50 users trying twice, all at the same time. The results are:

Host+ action Sync (ms) SD (ms) Async (ms) SD (ms)
WebWiz
 – List Posts ave(100), one user 90 4 95 9
 – List Posts ave(100) , 50 users 1065 450 1050 450
Azure, standard
 – List Posts ave(100), one user 110 11 120 50
 – List Posts ave(100) , 50 users 1200 500 1200 500

 

Notes:
– The Sync (ms) and Async (ms) columns are the average total time to get one list result from the web site using a synchronous and async EF method respectively.
– The SD (ms) column holds the standard deviation and shows the amount of variability of the results.

Analysis of MVC web site performance

    1. Async listing is slightly slower for one user
      There is a small difference, less than 10%, between the sync and async versions. However this must all be due to the use of async as the data is the same. It amounts to 5 to 10 milliseconds, which is about the same as we saw in the raw EF figures earlier.
    2. Using Async seems to have had no effect on the scalability of the web site.
      You will see that the figures for 50 simultaneous users is almost the same for both sync and async commands. That makes sense as the database is small and the query fairly simple and there is almost no waiting for the database (raw numbers for the database access part on small databases is 2 to 3 milliseconds).Clearly with much more complex data accesses then async would start to pull away in terms of scalability of the web site because the thread would be freed up for longer while the database is doing its work.
    3. Most of the time is spent in the MVC/Http part of the process.
      If you look at the timings for one request captured using Google Chrome’s developer tools you can see the parts of the timeline below:

listpostswebwiz

    This clearly shows what I said in the Simple-Talk article – that optimising/caching the other files, like css and JavsScript files would have far more effect than worrying about whether to use a sync or async data request.

I will end this section by saying it is quite difficult to test scalability on live web sites because they are designed to take a lot of load. You might like to look at Steven Sanderson’s video about threads and scalability which has an excellent demo http://channel9.msdn.com/Events/TechDays/Techdays-2012-the-Netherlands/2287.

Overall Conclusions

EF async data accesses have only a small overhead over the standard, sync commands. That is big news, and contrary to some of the older documentation that is out there. So feel free to use async EF commands in your next application.

Happy coding.

Alpha release of GenericServices

Last Updated: August 15, 2014 | Created: July 3, 2014
Quick Summary
This post introduces my new GenericServices framework designed to lighten the development load of building a  service layer.
This post summarises what GenericServices is about and the motivation behind it. It also provides links to code source and the example web site.

I am pleased to announce the alpha release of my new Open Source project called GenericServices available on GitHub. It will also be available on NuGet when it is has a more stable interface (watch this space).

GenericServices is a .NET class library which helps a developer build a service layer, i.e. a layer that acts as a facard/adapter between your business/data service layers and your User Interface or HTTP service.

It does this by providing standard database CRUD (Create, Read, Update and Delete) methods and a standard way of calling business methods, each with clear extension points. The aim is to cut out the standard boiler-plate code so the user only has to write the data or business specific code.

What application frameworks can GenericServices work with?

GenericServices is designed work as a service layer framework in any .NET application, such as  ASP.NET MVC, Widows Azure Web apps, etc. It assumed a disconnected state model, e.g. a web site or HTTP RESYful service where the read of data prior to update is disconnected from the update of the data.

I have also assumed a horizontal scaling model, e.g. scale by having multiple web instances,  as this is how Azure and most web sites scale. I have therefore not thought about serialisation of objects for vertical scaling, i.e. where each layer of the application are run on a separate server and remote calls are used between layers.

GenericServices uses the following .NET frameworks/systems.

  • It needs .NET 4.5 because it implements all commands in normal and the new async/await Tasking format introduced in .NET 4.5
  • It uses Entity Framework 6 (EF6) for database access, again because it supports async commands.
  • It also makes use of the open source AutoMapper library for transforming data and business classes to/from the user interface oriented DTOs.

What is the motivation behind building GenericServices?

I design and develop fairly complex analysing, modelling and data visualisation web applications (see Spatial Modeller). These require a Domain-Driven Design approach to the data and business objects, while the visualisation needs a comprehensive user interface which can include a Single Page Application (SPA) fed by a REST interface. This means there often a mismatch between the business/data layers classes and the data needed by the user interface and SPA.

My experience is that the Service Layer, plus Data Transfer Objects (DTOs), is the best way to solve mismatch. However I have found that the service layer is often filled with lots of code that is very similar, with just the data being different. Its also pretty boring to write. I therefore researched a number of approaches to handle the standard code and finally came up with a solution using C#’s Generic classes. I have therefore called it GenericServices.

Q: How would I use it? A: Look at example web site!

I have taken the trouble to produce a live example web site. This has examples of all the GenericService commands, with documentation explaining how they work – for example see the Posts Controller code explanation page.

As well as showing the use of GenericService it also contains another companion project of mine; code for executing long-running methods with progress and cancellation controls in ASP.NET MVC using SignalR.

This web site is an ASP.NET MVC5 application and is itself an open source project called SampleMvcWebApp. The source of is available on GitHub.

Feedback

While in alpha phase I suggest you leave comments here of contact me via this web site’s Contact Page. Once it is up on NuGet I will try and set up a Stack Overflow group and monitor that.

Mocking SignalR Client for better Unit Testing

Last Updated: January 22, 2015 | Created: June 10, 2014
Modal dialog showing task progress.
Modal dialog showing task progress. Uses SignalR.

Why I needed to Mock SignalR

I build geographic modelling applications and they have lots of tasks that take a long time, sometime many minutes, to run. I am currently  developing an open source framework to help speed up the development of such ASP.NET MVC applications. Therefore part of the framework I have includes modules for handling long running processes, with a progress bar, messages and user cancellation. Click on the image on the left to see a a simple example of a model panel with a green progress bar at the top and a list of sent messages as the task works its way through the calculations (well, in this case a test code so the messages aren’t that exciting) .

I have used SignalR for the communication channel between the JavaScript and the MVC5 server. I have found SignalR to be excellent and makes two-way comms really easy.

However my application is fairly complicated because of all the things that can happen, like errors, user wanting to cancel, losing connection, etc. In particular the JavaScript client uses a state machine to handle all the options, and that needs checking. For this reason I wanted to unit test both ends. (Read my blog on Unit Testing for an in-depth look at how I use Unit Testing).

The C# end was fairly straight forward to test, as it was partitioned well. However for the JavaScript end I needed to Mock the SignalR JavaScript library. I could not find anything online so I wrote something myself.

Mocking the SignalR JavaScript Client

I turns out that is wasn’t that hard to mock the SignalR Client, although I should say I don’t use the autogenerated SignalR hub scripts, but use the createHubProxy(‘HubName’) as I find that easier to manage that loading a dynamically created script. I have listed the code mocked SignalR Client code below:

//This is a mock for a small part of SignalR's javascript client.
//This code does not mock autogenerated SignalR hub scripts as the
//ActionRunner code uses the connection.createHubProxy('HubName') method,
//followed by .on or .invoke to setup the receive and send methods respectively

var mockSignalRClient = (function ($) {

    var mock = {};

    //first the items used by unit testing to see what has happened
    mock.callLog = null;
    mock.onFunctionDict = null;
    mock.doneFunc = null;
    mock.failFunc = null;
    mock.errorFunc = null;
    //This logs a string with the caller's function name and the parameters
    //you must provide the function name, but it finds the function arguments itself
    mock.logStep = function (funcName) {
        var log = funcName + '(';
        var callerArgs = arguments.callee.caller.arguments;
        for (var i = 0; i < callerArgs.length; i++) {
            log += (typeof callerArgs[i] === 'function') ? 'function, ' : callerArgs[i] + ', ';
        };
        if (callerArgs.length > 0)
            log = log.substr(0, log.length - 2);
        mock.callLog.push(log + ')');
    };
    mock.reset = function() {
        mock.callLog = [];
        mock.onFunctionDict = {}
        mock.doneFunc = null;
        mock.failFunc = null;
        mock.errorFunc = null;
    };

    //doneFail is the object returned by connection.start()
    var doneFail = {};
    doneFail.done = function (startFunc) {
        mock.logStep('connection.start.done');
        mock.doneFunc = startFunc;
        return doneFail;
    };
    doneFail.fail = function(failFunc) {
        mock.logStep('connection.start.fail');
        mock.failFunc = failFunc;
        return doneFail;
    };

    //Channel is the object returned by connection.createHubProxy
    var channel = {};
    channel.on = function (namedMessage, functionToCall) {
        mock.logStep('channel.on');
        mock.onFunctionDict[namedMessage] = functionToCall;
    };
    channel.invoke = function (actionName, actionGuid) {
        mock.logStep('channel.invoke');
    };

    //connection is the object returned by $.hubConnection
    var connection = {};
    connection.createHubProxy = function (hubName) {
        mock.logStep('connection.createHubProxy');
        return channel;
    };
    connection.error = function (errorFunc) {
        mock.logStep('connection.error');
        mock.errorFunc = errorFunc;
    };
    connection.start = function () {
        mock.logStep('connection.start');
        return doneFail;
    };
    connection.stop = function () {
        mock.logStep('connection.stop');
        return doneFail;
    };

    //now we run once the method to add the hubConnection function to jQuery
    $.hubConnection = function() {
        return connection;
    };

    //Return the mock base which has all the error feedback information in it
    return mock;

}(window.jQuery));

I think you will find most of this fairly easy to understand. Lines 8 to 34 are all the variables and methods for Unit Testing to use. The rest of the code implements the methods which mock the SignalR methods I use in my code.

How did I use this Mock SignalR?

SignalR works by adding .hubConnection() to jQuery so it was simple to make the mock SignalR client do the same (see line 71 above). My actual code checks that jQuery is present and then that $.hubConnection is defined, which ensures SignalR is loaded. Here is a piece of code from my ActionRunner.comms.js that does the initial setup to see how it uses SignalR and therefore what I needed to Mock.

//This deals with setting up the SignalR connections and events
function setupTaskChannel() {

    actionRunner.setActionState(actionStates.connectingTransient);

    actionRunner.numErrorMessages = 0;

    //Setup connection and actionChannel with the functions to call
    var connection = $.hubConnection();

    //connection.logging = true;
    actionChannel = connection.createHubProxy('ActionHub');
    setupTaskFunctions();

    //Now make sure connection errors are handled
    connection.error(function(error) {
        actionRunner.setActionState(actionStates.failedLink);
        actionRunner.reportSystemError('SignalR error: ' + error);
    });
    //and start the connection and send the start message
    connection.start()
        .done(function() {
            startAction();
        })
        .fail(function(error) {
            actionRunner.setActionState(actionStates.failedConnecting);
            actionRunner.reportSystemError('SignalR connection error: ' + error);
        });
}
Jasmine Unit Test checking what SignalR functions were called
Jasmine Unit Test checking what SignalR functions were called

Using this mocking framework

There are two main ways I use it. Firstly you get a log of each method called, which helps ensure the right methods are called.

Secondly most of the calls to SignalR link functions to certain SignalR events. By capturing these functions the unit test can call them to simulate SignalR messages, errors etc. That allows a very good level of checking.

Getting the whole picture

In case you are interested in downloading the code or seeing how it was used then here are a series of links to various bits of code. These are taken form an open source project that I am currently working on, so the code is subject to change. I have listed all the various parts of the testing. UPDATE: These links had broken because the git repository had changed – sorry about that. Now fixed.

Conclusion

As I said in my previous post ‘Unit Testing in C# and JavaScript‘ I find mocking in JavaScript very easy and helpful. The ActionRunner is complex enough to need unit testing and I found mocking the various parts I wanted to replace fairly quick to implement.

I hope this helps you with SignalR and encourages you to mock other frameworks to help you test more easily. Happy coding.

 

 

 

Reflections on Unit Testing in C# and JavaScript

Last Updated: June 8, 2014 | Created: May 31, 2014

It has been a while since I wrote my first blog post called ‘Why I’m not scared of JavaScript any more‘ and since then I have written a fair amount of JavaScript. One of the things that I have found interesting is how I use different approaches to Unit Testing in each language. I’m not saying I have the ‘best’ way of Unit Testing each language, in fact I think some areas are a bit weak, but maybe you will find my experiences and reflections useful.

If you want to get to the meat of the post skim the next three paragraphs that set the scene and jump to ‘The differences in how I Unit Test between the two languages’ section.


First, respect to Gilles Ruppert…

Before I start I should say that I was heavily helped on the JavaScript side by an excellent programmer called Gilles Ruppert. As I was new to JavaScript I sort help and my son recommended Gilles. Gilles worked for me for about 10 weeks and set up the architecture of the single page application that I needed. He also set up an excellent JavaScript development environment using Karma test runner, Mocha test framework, Expect assertions and finally Sinon stub/spy/mocks. I am building on the shoulders of a giant.

Setting the scene

have just finished the first release of Spatial Modeller™ , a medium sized ASP.NET MVC web application using ASP.NET MVC4 written in C# and a large single page application written in JavaScript using backbone with marionette.

When Gilles finished we had just under 300 JavaScript Unit Tests, a handful written by me. Now I have added a lot more code and written an additional 400+ Unit Tests, taking the JavaScript tests to over 700.

On the C# side I use NUnit unit testing framework with a small amount of Moq for mocking etc. I use Resharper for running the testing inside Visual Studio. I have used this combination for some years.

Numerically I have less Unit Tests than JavaScript, currently C# has just under 500. However each test is often much more complicated and test many items per test. The reasons for this difference in approach is one of the reasons I wrote this blog, so read on to find out why.

My style of Unit Testing

I really believe in Unit Testing and I don’t think I have a project that doesn’t used Unit Testing. However I am not a fan of Test Driven Development (TDD) as I have found that way I don’t come up with coherent design. Writing Unit Tests is not an excuse for not doing some good designing of the system first.


The differences in how I Unit Test between the two languages

After maybe 10 months of JavaScript Unit Testing (and 5+ years of C# Unit Testing) I find it really interesting that I use different styles between the two. Here are the raw  differences and I’ll make some conclusions at the end.

1. In C# my first Unit Test in the compiler

Because C# is typed the compiler gives me lots of feedback. Therefore I my refactor code before I have even run it if I see opportunities to improve the code. This doesn’t happen with JavaScript, partly because JavaScript is not typed in the same way so tools like JSHint cannot give such comprehensive feedback.

(hint: In Visual Studio you can install Web Essentials which will then run JSHint and JSCS when a JavaScript file is closed. It may not be as good and the C# compiler at spotting things, but it can help spot silly mistakes.)

This means I am willing to write more C# code before I Unit Test than I would with JavaScript.

2. Large JavaScript is much easier to partition than C#

The ‘up’ side of not so tight checking in JavaScript is much easier to build in smaller chunks. The ‘interface’ between these parts is often an object which, by the nature of JavaScript, is not directly linked between modules. As well as making Mocking really easy it seems to help me think about each part separately.

In C# I use interfaces and layer to separate chunks of code, but for something complex  with five+ significant classes that I tend to think of them as a whole. The typed nature makes them more ‘linked’ than the JavaScript equivalent and mocking is harder.

3. The unit testing frameworks make a BIG difference

I have found that test frameworks like Mocha and Jasmine have some featured that encourage small tests. This is because these frameworks support ‘nested setups’, which other frameworks like NUnit don’t have. Let me explain what ‘nested setups’ are and why they encourage smaller, one item tests.

For most tests we need to setup some instances or data before we run the test. In NUnit we can run a setup method a) once at the start of the test class and/or b) once before each test is run. This is fine, but I find that I often need to run something specific before each test, which in NUnit you to at the beginning of the specific test. See example below where I run a ‘once at start’ method called ‘SetUpFixture’ (lines 1 to 6) and then an additional setup phase inside the specific test ‘CheckValidateTagError’ (line 14).

[TestFixtureSetUp]
public void SetUpFixture()
{
    using (var db = new SampleWebAppDb())
        DataLayerInitialise.ResetDatabaseToTestData(db, TestDataSelection.Simple);
}

[Test]
public void CheckValidateTagError()
{
    using (var db = new SampleWebAppDb())
    {
        //SETUP
        var existingTag = db.Tags.First();

        //ATTEMPT
        var dupTag = new Tag {Name = "duplicate slug", Slug = existingTag.Slug};
        db.Tags.Add(dupTag);
        var status = db.SaveChangesWithValidation();

        //VERIFY
        status.IsValid.ShouldEqual(false);
        status.Errors.Count.ShouldEqual(1);
        status.Errors[0].ErrorMessage.ShouldEqual("The Slug on tag 'duplicate slug' must be unique.");
    }
}

This fine, but I find that the individual setups can become quite long. I then have three choices if I have multiple things to check: a) duplicate the individual setup code in each test (Not good for DRY), b) write a small helper method which encapsulates the setup code (takes time and hides the setup), or c) test multiple items in the one test, which is what I have done above.

Both the Mocha and Jasmine JavaScript testing frameworks allow nesting of the setups, so in my last example I could have an outer group with the ‘SetUpFixture’ in it and then a small nested group, with a setup just for CheckValidateTagError, and then three separate tests for the three parts.

Here is an example from my actual code with some nested test groups, which Mocha does with the ‘describe’ command:

... various require code

describe('views/InfoPanel', function () {
    afterEach(function () {
        delete this.model;
        delete this.view;
    });

    describe('single item to display, no grades', function () {
        beforeEach(function () {
            this.layerInfo = helpers.loadFixture('/api/schemes/1').layerInfos[0];
            this.members = helpers.loadFixture('/api/layers/1/members');
            this.model = new InfoPanelModel({ selectedMembers: this.Members });
            this.view = new InfoPanelView({ model: this.model });
        });
        afterEach(function () {
            delete this.layerInfo;
            delete this.layer;
            delete this.members;
        });

        it('should be a Backbone.View', function () {
            expect(this.view).to.be.a(Backbone.View);
        });

        describe('after showing view', function () {
            beforeEach(function () {
                this.mainRegion.show(this.view);
            });

            it('should have something on view', function () {
                expect(this.view.$el).to.have.length(1);
            });

            it('title in window header should start with layer name', function () {
                ... etc.
            });

You can see the first setup, called ‘beforeEach’ on lines 10 to 15 and the the nested ‘beforeEach’ on lines 27 to 29. This works really well and is the reason why my JavaScript Unit Tests test almost always check one item per test. I really like Mocha’s approach and miss it in NUnit.

4. Mocking databases is hard!

One thing that definitely affects me is that the database side is hard to mock. I have used a number of ORMs, and direct T-SQL, and there isn’t a nice way to replace the real code with something for testing that will catch the same errors. Foreign keys, relational fixed etc. etc. is realy hard to mock (just have a look at my post Updating a many to many relationship in entity framework to see the complications that arise).

This means some of by Unit Tests in C# are more integration tests as I use the real database layer, but with a test database loaded with test data. There is a cost to that, but my experience is anything else lets errors through.

JavaScript using backbone.js does access data, but in a more controlled way and Gilles set up the classic ‘fixtures’ and filled it with test data created by the C# application. That makes JavaScript testing much easier, but only because the server is dealing with the complex parts.

5. PS: You must be able to debug inside Unit Tests

It goes without saying that Unit Tests are for finding bugs before you use the code in the real application. Therefore you MUST be able to debug code, with breakpoints and variable inspection etc., when you are running a unit test.

I mention this only because Resharper, which I find really useful, is very bad in this area when unit testing JavaScript. Resharper makes running unit tests really easy, especially on JavaScript using QUnit or Jasmine. However I have found it almost impossible to use it to debug Jasmine tests, and I assume QUnit tests, when running in the Resharper harness.

However the Mocha JavaScript test environment that Gilles set up is excellent on that score. It allows running with connected browser, use of debugger; statement to cause pauses when the debugger is open and the .only() statement to allow only certain tests to be run.

How this affects my Unit Testing

  1. My JavaScript Unit Tests are shorter and just test one item per test.
  2. I write new Unit Test groups more often in JavaScript than in C#.
  3. My C# Unit Tests are often more complex and often test multiple items per test. This is because:
    1. The setup code is complicated or time consuming.
    2. NUnit does not have nested setups like Mocha does
      (see The unit testing frameworks make a BIG difference above).
  4. I often write more C# code before writing Unit Tests as the the compiler catches a lot of ‘silly’ mistakes in C# which can get through in JavaScript.
  5. C# sees more ‘linked together’ than JavaScript, which affects mocking and designing.

I know that testing multiple items in one test, point 3 above, is often seen as ‘wrong’ by some, but cost of both code to set it up and time to run the test are all factors I have weighted up in coming to my current style.

I have more C# code than JavaScript code by maybe 3 to 1. My 500 C# Unit Tests take nearly 2 minutes to run while my 700 JavaScript Unit Tests take 30 seconds.

Conclusion

I like both C# and JavaScript. They each have their strengths and weaknesses. I also could not work a project without the feedback that Unit Tests give me on the quality of the software.

With all of this in mind it is worth thinking about how you Unit Test in each language. The language, the need, the project and the environment will change the way Unit Tests can and should be approached. Happy programming.

Updating a many to many relationship in entity framework

Last Updated: March 2, 2017 | Created: May 22, 2014
Quick Summary
I found conflicting information on how to update many-to-many relationships in entity framework. I therefore decided to research the problem properly. This post shares with you my research.
I believe I now understand how entity framework works with automatic many-to-many relationships and how to implement my own linking table if I need to.

I had problems working out the best way to update a many to many relationship in Entity Framework (referred to as EF). I ended up doing some research and I thought others might like to see what I found out.

Firstly the environment I am working in:

  • I am using EF 6.1 in Code First mode. I am pretty sure what I have found applies across the board, but I have only tested on EF 6.1
  • I am working in a web application (MVC 5) so I have to handle disconnected classes.

If you just want the quick answer then goto the Part 1 Conclusion. The Conclusions also have a link to a live web site where you can see a many-to-many relationship used for real (with access to the source too). However its worth reading some of the examples so you understand why it works.

UPDATE 2017: I have written a new version of this article for Entity Framework Core (EF Core) to celebrate the release of my book Entity Framework Core in Action.

EF Core does NOT work the same way as EF6.x and the new article provides information on how to handle many-to-many relationships in EF Core.


Part 1: Using EF’s automatic linking of many-to-many relationships

In this example I have simply linked that tag and the post by having ICollection<Tag> Tags in the Post class and ICollection<Post> Posts in the Tag class. See navigation properties in Tag and Post in diagram below:

Tag, Post and Blog entity framework classes (blog dimmed as not important)
Tag, Post and Blog entity framework classes

As explained in the Microsoft tutorial EF will create a linking table called TagPosts or PostTags. This link table, which you never see, provides the many to many linking. This way a post can have none to many tags and tags can be in none to many posts.

So the action I want to perform is to change the tags on a post. Sounds simple, and it is in this example if you watch out for a few things.

First answer: update many-to-many relationship in connected state

Below are two unit Unit Tests which doesn’t need to worry about disconnected classes. These are good starting points as its important to know how EF does the simplest case. The first test adds a tag to a post while the second one replaces the current tags with a new set. Note that the first post starts with two tags; ‘good’ and ‘ugly’.

[Test]
public void Check25UpdatePostToAddTagOk()
{
    using (var db = new SampleWebAppDb())
    {
        //SETUP
        var badTag = db.Tags.Single(x => x.Slug == "bad");
        var firstPost = db.Posts.First();

        //ATTEMPT
        db.Entry(firstPost).Collection( x => x.Tags).Load();
        firstPost.Tags.Add( badTag);
        db.SaveChanges();

        //VERIFY
        firstPost = db.Blogs.Include(x => x.Posts.Select(y => y.Tags)).First()
                            .Posts.First();
        firstPost.Tags.Count.ShouldEqual(3);
    }
}

[Test]
public void Check26ReplaceTagsOk()
{
    using (var db = new SampleWebAppDb())
    {
        //SETUP
        var firstPost = db.Posts.First();
        var tagsNotInFirstPostTracked =
             db.Tags.Where(x => x.Posts.All(y => y.PostId != firstPost.PostId))
                    .ToList();

        //ATTEMPT
        db.Entry(firstPost).Collection(x => x.Tags).Load();
        firstPost.Tags = tagsNotInFirstPostTracked;
        db.SaveChanges();

        //VERIFY
        firstPost = db.Blogs.Include(x => x.Posts.Select(y => y.Tags)).First()
                            .Posts.First();
        firstPost.Tags.Count.ShouldEqual(1);
    }
}

Now, you should see the two important lines (line 11 and 34) which loads the current Tags in the post. This is really important as it loads the post’s current tags so that EF can track when they are changed. There are a number of alternatives to using .Load(), that would work.

  1. Add the appropriate .Include() when loading the posts, e.g.
    var firstPost = db.Posts.Include( post => post.Tags).First()
  2. Make the Tags  property in the Post class virtual, e.g.
    public virtual Collection<Tag> Tags { get; set; }

Having used .Load(), .Include() or a virtual property then EF tracks the data and then does all the work to remove the TagLinks rows. This is very clever and very useful.

I wanted to really prove to myself that my findings were correct out so I wrote another Unit Test to test the failure case. The unit test below shows conclusively that if you don’t load the current Tags it gets the wrong result. As I said earlier the first post started with the ‘good’ and ‘ugly’ tags and should have ended up with ONLY the ‘bad’ tag. However the Unit Test shows it ended up with all three.

[Test]
public void Check05ReplaceAllTagsNoIncludeBad()
{
    using (var db = new MyDbContext())
    {
        //SETUP
        var snap = new DbSnapShot(db);
        var firstPost = db.Posts.First();
        var badTag = db.Tags.SingleOrDefault(x => x.Slug == "bad");

        //ATTEMPT
        firstPost.Tags = new List { badTag };
        db.SaveChanges();

        //VERIFY
        snap.CheckSnapShot(db, 0, 1);
        var readPost = db.Posts.Include(x => x.Tags)
                         .SingleOrDefault(x => x.PostId == firstPost.PostId);
        CollectionAssert.AreEquivalent(new[] { "good", "bad", "ugly" },
                                       readPost.Tags.Select(x => x.Slug));
    }
}

As you can see from the above example it ended up with all three, which is not the right answer.

Second answer: update many-to-many relationship in disconnected state

When working with a web application like MVC an update is done in two stages. Firstly the current data is sent to the user who updates it. Secondly when the user presses submit the new data is sent back, but its now disconnected from the database, i.e. EF is not tracking it. This makes for a slightly more complicated case, but still fairly easy to handle.

Below is my method for updating the posts. In this case I have filled a MultiSelectList with all the tags and it returns the ids of the tags that the user has chosen. I should also point out I use the same method for create and update, hence the test on line 5 to see if I need to load the current tags collection.

private void ChangeTagsBasedOnMultiSelectList(SampleWebAppDb db, Post post)
{
   var requiredTagIds = UserChosenTags.GetFinalSelectionAsInts();

   if (post.PostId != 0)
       //This is an update so we need to load the tags
       db.Entry(post).Collection(p => p.Tags).Load();

   var newTagsForPost = db.Tags
                          .Where(x => requiredTagIds.Contains(x.TagId)).ToList();
   post.Tags = newTagsForPost;
}

The important thing is that I loaded new tags from the database so they are tracked.

Conclusion from part 1

If you want to update an EF provided many-to-many link then:

  1. Pick one end of a many-to-many relationship to update. EF will sort out the other end for you.
  2. Make sure that the collection you will change is loaded, either by putting virtual on the property, using .Include() in the initial load or using .Load() later to get the collection.
    Note: .Include() is the best performing of the three as it means the data is loaded in the initial SQL command. The other two, .Load() and virtual, require a second SQL access.
  3. Make sure the any new items, e.g. Tags in my example, are loaded as a tracked entity.
    This is normal case, but in the disconnected state, i.e. in a web app, that might not be true (see examples above).
  4. Call EF’s .SaveChanges() to save the changes.

Because you have tracked entities then EF’s change tracking will notice it and sort out the adding or removing of rows in the hidden link table for you.

Live web site with many-to-many relationship update

As part of my GenericServices open-source project I have build two example web sites. One of them has a simple list of blog Posts, each of which has one or many Tags. The Posts and Tags are linked by a many-to-many table. This web site is live and you can try it yourself.

  • Edit a post at http://samplemvcwebapp.net/Posts and change the tags – Note: if accessed from desktop then you need to use control-click to select multiple tags.
  • You can also see the code that updates this many-to-many relationship via the open-source project SampleMvcWebApp – see the code right at the end of the DetailPostDto.cs class in the method ChangeTagsBasedOnMultiSelectList.

Part 2: Using your own link table for many-to-many relationships

There are cases when you want to have your own link table, possibly because you want to include some other property in the link table. Below is a case where I have simply created my own link table with no extra properties. As you can see the PostTagLink table has a row for every Tag / Post link just like the hidden table that EF produced in the first instance. However now that we produced this we need to keep it up to date.

Tag, TagPostLink and Post classes
Tag, PostTagLink and Post classes

So let us carry out the same type of Unit Test (connected) code and the MVC (disconnected) code we did in the first Example.

First answer: update many-to-many relationship in connected state

The two unit tests below now show that we need to manipulate our PostTagLink table entries, not the navigation properties in the Post. After we have saved the changes the Post’s AllocatedTags list will reflect the changes through EF’s relational fixup done on any tracked classes.

[Test]
public void Check25UpdatePostToAddTagOk()
{
    using (var db = new SampleWebAppDb())
    {
        //SETUP
        var badTag = db.Tags.Single(x => x.Slug == "bad");
        var firstPost = db.Posts.First();

        //ATTEMPT
        db.PostTagLinks.Add(new PostTagLink { InPost = firstPost, HasTag = badTag });
        db.SaveChanges();

        //VERIFY
        firstPost = db.Posts.Include(x => x.AllocatedTags).First();
        firstPost.AllocatedTags.Count.ShouldEqual(3);
    }
}

[Test]
public void Check26UpdatePostToRemoveTagOk()
{
    using (var db = new SampleWebAppDb())
    {
        //SETUP
        var firstPost = db.Posts.First();
        var postTagLinksToRemove =
             db.PostTagLinks.First(x => x.PostId == firstPost.PostId);

        //ATTEMPT
        db.PostTagLinks.Remove(postTagLinksToRemove);
        db.SaveChanges();

        //VERIFY
        firstPost = db.Posts.Include(x => x.AllocatedTags).First();
        firstPost.AllocatedTags.Count.ShouldEqual(1);
    }
}

I think the code speaks for itself, i.e. you add or remove rows from the PostTagLinks table to change the links.

Second answer: update many-to-many relationship in disconnected state

Just like the first example when using MVC I have filled a MultiSelectList with all the tags and it returns the ids of the tags that the user has chosen. So now I need to add/remove rows from the PostTagLinks table. However I do try not to change links that are still needed, hence I produce tagLinksToDelete and tagLinksToAdd as its more efficient.

private void ChangeTagsBasedOnMultiSelectList(SampleWebAppDb db, Post post)
{
    var requiredTagIds = UserChosenTags.GetFinalSelectionAsInts();

    var tagLinksToDelete =
        db.PostTagLinks.Where(x => !requiredTagIds.Contains(x.TagId) && x.PostId == PostId).ToList();
    var tagLinksToAdd = requiredTagIds
        .Where(x => !db.PostTagLinks.Any(y => y.TagId == x && y.PostId == PostId))
        .Select(z => new PostTagLink {InPost = post, HasTag = db.Tags.Find(z)}).ToList();

    //We get the PostTagLinks entries right, which is what EF needs
    tagLinksToDelete.ForEach(x => db.PostTagLinks.Remove(x));
    tagLinksToAdd.ForEach(x => db.PostTagLinks.Add(x));
    //********************************************************************
    //If using EF 6 you could use the more efficent RemoveRange, e.g.
    //db.PostTagLinks.RemoveRange(tagLinksToDelete);
    //db.PostTagLinks.AddRange(tagLinksToAdd);
    //********************************************************************
}

Conclusion from part 2

If you have your own link table for handling many-to-many relationships you need to

  1. Add or remove entries from you link table, in my case called PostTagLinks.
  2. Make sure the any new items, e.g. Tag, added to your link table is loaded as a tracked entity.
  3. Call EF’s .SaveChanges() to persist the changes.

Well, that took longer than I expect to write the blog, but I hope it helps others in really understanding what is going on underneath EF’s many-to-many relationships. Certainly I now feel much more confident on the topic.

Additional note: You will see I use EF commands directly and do not use a repository or UnitOfWork pattern when accessing the database. You might like to read my post on ‘Is the Repository pattern useful with Entity Framework?‘ as to why I do that.

Is the Repository pattern useful with Entity Framework?

Last Updated: February 21, 2018 | Created: May 10, 2014
Quick Summary
This is, hopefully, a critical review of whether the repository and UnitOfWork pattern is still useful with the modern implementations of Microsoft’s Entity Framework ORM. I look at the reasons why people are suggesting the repository pattern is not useful and compare this with my own experience of using the repository and UnitOfWork over many years.

This is a series:

  1. Part 1: Analysing whether Repository pattern useful with Entity Framework (this article).
  2. Part 2: Four months on – my solution to replacing the Repository pattern.
  3. UPDATE (2018): Big re-write to take into account Entity Framework Core, and further learning.

I have just finished the first release of Spatial Modeller™ , a medium sized ASP.NET MVC web application. I am now undertaking a critical review of its design and implementation to see what I could improve in V2. The design of Spatial Modeller™ is a fairly standard four layer architecture.

Four layer design drawing
Software design of Spatial Modeller Web app

I have use the repository pattern and UnitOfWork pattern over many years, even back in the ADO.NET days. In Spatial Modeller™ I think I have perfected the design and use of these patterns and I am quite pleased with how it helps the overall design.

However we don’t learn unless we hear from others that have a counter view, so let me try and summarise the arguments I have seen against the repository/Unit of Work pattern.

What people are saying against the repository pattern

In researching as part of my review of the current Spatial Modeller™ design I found some blog posts that make a compelling case for ditching the repository. The most cogent and well thought-out post of this kind is ‘Repositories On Top UnitOfWork Are Not a Good Idea’. Rob Conery’s main point is that the repository & UnitOfWork just duplicates what Entity Framework (EF) DbContext give you anyway, so why hide a perfectly good framework behind a façade that adds no value. What Rob calls ‘this over-abstraction silliness’.

Another blog is ‘Why Entity Framework renders the Repository pattern obsolete’. In this Isaac Abraham adds that repository doesn’t make testing any easier, which is one thing it was supposed to do.

I should also say I found another blog, ‘Say No to the Repository Pattern in your DAL’ which, says that using a repository removes access to Linq querying, ability to include/prefetch or aync support (EF 6). However this not true if your repository passes IQueryable items as this allows all these features.

So, are they right?

Reviewing my use of repository and UnitOfWork patterns

I have been using repositories for a while now and my reflection is that when the ORMs weren’t so good then I really needed repositories.

I build a geographic modelling application for a project to improve HIV/AIDS testing in South Africa. I used LINQ SQL, but because it didn’t support spatial parts I had to write a lot of T-SQL stored procedures and use ADO.NET to code access the spatial values. The database code was therefore a pretty complicated mix of technologies and a repository pattern acted as a good Façade to make it look seamless. I definitely think the software design was helped by the repository pattern.

However EF has come a long way since then. Spatial Modeller™ used EF 5, which was the first version to support spatial types (and enums, which is nice). I used the repository pattern and they have worked well, but I don’t think it added as much value as in the South Africa project as the EF code was pretty clean. Therefore I sympathise with the sentiments of Rob etc.

My views on the pros and cons of repository and UnitOfWork patterns

Let me try and review the pros/cons of the repository/UnitOfWork patterns in as even-handed way as I can. Here are my views.

Benefits of repository and UnitOfWork patterns (with best first)

  1. Domain-Driven Design: I find well designed repositories can present a much more domain specific view of the database. Commands like GetAllMembersInLayer make a lot more sense than a complex linq command. I think this is a significant advantage. However Rob Conery’s post suggests another solution using Command/Query Objects and references an excellent post by Jimmy Bogard called ‘Favor query objects over repositories’.
  2. Aggregation: I have some quite complex geographic data consisting of six or seven closely interrelated classes. These are treated as one entity, with one repository accessing them as a group. This hides complexity and I really like this.
  3. Security of data: one problem with EF is if you load data normally and then change it by accident it will update the database on the next SaveChanges. My repositories have are very specific what it returns, with GetTracked and GetUntracked versions of commands. And for things like audit trails I only ever return them as untracked. (see box below right if you not clear on what EF tracking is).
  4. Hiding complex T- SQL commands: As I described before sometimes database accesses need a more sophisticated command that needs T-SQL. My view these should be only in one place to help maintenance. Again I should point out that Rob Conery’s post Command/Query Objects (see item 1) could also handle this.
Not sure what EF Tracked is?
A normal EF database command using DbContext returns ‘attached’ classes. When SaveChanges() is called it checks these attached classes and writes any changed data back to the database. You need to append AsNoTracking() to the linq command if you want a read-only copy. Untracked classes are also slightly faster to load.

Non-benefits of repository and UnitOfWork patterns are:

  1. More code: Repositories and UnitOfWork is more code that needs developing, maintaining and testing.
  2. Testing: I totally agree with Isaac Abraham. Repositories are no easier to mock than IDbSet. I have tried on many occasions to mock the database, but with complex, interrelated classes it is really hard to get right. I now use EF and a test database in my unit tests for most of the complex models. It’s a bit slower but all my mocks could not handle the relational fixup that EF does.

Conclusion

I have to say I think Command/Query Objects mentioned by Rob Conery and described in detail in Jimmy Bogard’s post called ‘Favor query objects over repositories’ have a lot going for them. While the repository/UnitOfWork pattern has served me well maybe EF has progressed enough to use it directly, with the help of Command/Query Objects to make access more Domain-Driven in nature.

What I do in a case like this is build a small app to try out a new approach. I ensure that the app has a few complex things that I know have proved a problem in the past, but small enough to not take too long. This fits in well with me reviewing the design of Spatial Modeller™ before starting on version 2. I already have a small app looking at using t4 code generation to produce some of the boiler plate code and I will extend the app to try a non-repository approach. Should be interesting.

UPDATE – 4 months & 8 months later
Read the second part of this series after four months of development and my conclusions on how to replace the Repository/Unit of Work pattern with new ways of working.  Quite a journey, but I think a useful development. Have a read and see what you think. PS. There is a live example web site showing some of the approaches which you can play with and access the code.
Happy coding!

Why I’m not scared of JavaScript any more

Last Updated: May 2, 2014 | Created: April 10, 2013

My recent programming experience has been in C# and .NET, mainly on windows apps to support my business. However I needed to move to a web-based application, especially for our work in Africa.

The problem was the only bits of JavaScript I had seen up to this point was a few lines to say handle a button click. It looked very unstructured and, for a person used to OOP and C#,  rather nasty. So, what do I do in these cases… I bought a book, or in this case I bought two books. The effect was magic.

javascript design patterns bookBook 1: Learning JavaScript Design Patterns

The first book I read was Learning JavaScript Design Patterns by Addy Osmani. I bought this because a) it was on design patterns and b) it was very recent. The last is important because JavaScript is developing at such a fast pace.

This book was just what I needed. I opened my eyes to how to Construct JavaScript cleanly and provide good structure to my programming. It made me much more confident of how to write well structured JavaScript.

I really recommend this book to anyone who is a programmer but doesn’t really know about the new ways of  building JavaScript programs.

effective javascript bookBook2: Effective JavaScript

However I knew that a book on design patterns wouldn’t tell me the nitty gritty of the language. I already had the book JavaScript: The Good Parts by Douglas Crockford, but that hadn’t helped me to get inside the thinking of JavaScript.

I found a recent book called Effective JavaScript by David Herman. I have two Effective C# books and they really get into the depths of C#, so I got the JavaScript version. I was not disappointed.

Where the Design Pattern book gave me the overview the Effective JavaScript book took me deep into the inner workings of JavaScript. Also, from skim reading it to get a good idea of what to do and, more importantly, what NOT to do. This is now my bible of good coding standards in JavaScript.

Did the books help?

So, did reading these books make me a great JavaScript programmer? Of course not. But I was confident enough to design a fairly complex geospatial data visualisation system in JavaScript to work with OpenLayers.

I then when on to write the first basic module and some constructors to do the initial display of data on a map. They have a nice separation of concerns and when I wanted to refactor something it was nice and easy. It also came together quite easily with few bugs, although I really miss not having Unit Tests (that is another blog post to come!) Its likely I will get a JavaScript Guru to build the proper system, but at least I feel confident that I know what is going on now.

Maybe in a future post I will write why I actually LIKE JavaScript now because there are some bits that are really nice. But more importantly JavaScript is now the primary (only?) way ahead for responsive web design.