Entity Framework: working with an existing database

Last Updated: July 31, 2020 | Created: November 1, 2014
Quick Summary
This two-part article looks at the software issues raised when presented with a database that was designed and built prior to software development. Part 1 (this article) explores the low-level problems of connecting to that database using Entity Framework . Part 2 looks at the high-level problems of transforming the EF data into a form suitable for display via an ASP.NET MVC web site.

Part 1: Using Entity Framework with an existing database

In many large-scale projects software developers are often have to work with existing SQL Server databases with predefined tables and relationships. The problem can be that some predefined databases can have aspects that are not so easy to deal with from the software side.

This is a two part article following my experiences of putting together a web application using the AdventureWorksLT2012 database, which is a cut-down version of the larger AdventureWorks OLTP database. I am using Microsoft’s ASP.NET MVC5 (MVC) with the propriety Kendo UI (KendoUI) package for the UI/presentation layer. The two parts to the article are:

  1. PART 1 (this article). The best way to get Microsoft’s Entity Framework version 6 (EF) data access tool to import and access the database.
  2. PART 2: (this link) This looks at the higher level functions is needed to transform that data into a user-focused display. In this case using the GenericServices open-source library to build a service/application layer to connect to modern web front end.

Let us start on…

Using Entity Framework with an existing database

Entity Framework 6 has a number of features to make working with existing databases fairly straightforward. In this article I detail the steps I needed to take on the EF side to build a fully featured web application to work with the AdventureWorks database. At the end I also mention some other techniques that I didn’t need for AdventureWorks, but I have needed on other databases. The aim is to show how you can use EF with pre-existing databases, including ones that need direct access to T-SQL commands and/or Stored Procedures.

1. Creating the Entity Framework Classes from the existing database

Entity Framework has a well documented approach, called reverse engineering, to create the EF Entity Classes and DbContext from an existing database which you can read here. This produces data classes with various Data Annotations to set some of the properties, such as string length and nullablity (see below), plus a DbContext with an OnModelCreating method to set up the various relationships.

namespace DataLayer.GeneratedEf
{
    [Table("SalesLT.Customer")]
    public partial class Customer
    {
        public Customer()
        {
            CustomerAddresses = new HashSet<CustomerAddress>();
            SalesOrderHeaders = new HashSet<SalesOrderHeader>();
        }

        public int CustomerID { get; set; }

        public bool NameStyle { get; set; }

        [StringLength(8)]
        public string Title { get; set; }

        [Required]
        [StringLength(50)]
        public string FirstName { get; set; }

        [StringLength(50)]
        public string MiddleName { get; set; }

        //more properties left out to shorten the class...
        //Now the relationships

        public virtual ICollection<CustomerAddress> CustomerAddresses { get; set; }

        public virtual ICollection<SalesOrderHeader> SalesOrderHeaders { get; set; }
}

This does a good job of building the classes. Certainly having the Data Annotations is very useful as front-end systems like MVC use these for data validation during input. However I did have a couple of problems:

  1. The default code generation template includes the `virtual` keyword on all of the relationships. This enabled lazy loading, which I do not want. (see section 1.1 below)
  2. The table SalesOrderDetail has two keys: one is the SalesOrderHeaderID and one is an identity, SalesOrderDetailID. EF failed on a create and I needed to fix this. (See section 1.2 below)

I will now describe how I fixed these issues.

1.1. Removing lazy loading by altering the scaffolding of the EF classes/DbContext

As I said earlier the standard templates enable ‘lazy loading’. I have been corrected in my understanding of lazy loading by some readers. The documentation states that ‘Lazy loading is the process whereby an entity or collection of entities is automatically loaded from the database the first time that a property referring to the entity/entities is accessed’. The problem with this is it does not make for efficient SQL commands, as individual SQL SELECT commands are raised for each access to virtual relationships, which is not such as good idea for performance.

As you will see in part 2 of this article I use a technique that ‘shapes’ the sql read to only loads the individual properties or relationships I need. I therefore do not need, or want, lazy loading of relationships.

Now you could hand edit each generated class to remove the ‘virtual’, but what happens if (when!) the database changes? You would then reimport the database and lose all your edits, which you our your colleague might have forgotten about by then and suddenly your whole web application slows down. No, the common rule with generated code is not to edit it. In this case the answer is to change the code generated during the creating of the classes and DbContext. This leads me on to…

Note: You can turn off lazy loading via the EF Configuration class too, but I prefer to remove the virtual keyword as it ensures that lazy loading is definitely off.

The generation of the EF classes and DbContext is done using some t4 templates, referred to as scaffolding. Be default it uses some internal scaffolding, but you can import the scaffolding and change it. There is a very clear explanation of how to import the scaffolding using NuGet here, so I’m not going to repeat it.

Once you have installed the EntityFramework.CodeTemplates you will find two files called Content.cs.t4 and EntityType.cs.t4, which control how the DbContext and each entity class are build respectively. Even if you aren’t familiar with t4 (a great tool) then you can understand what it does – its a code generator and anything not surround by <# #> is standard text. I found the word virtual in the EntityType.cs.t4 and deleted it. I also removed virtual from the Content.cs.t4 file on the declaration of the DbSet<>.

You may want to alter the scaffolding more extensively, say to add a [Key] attribute on primary keys for some reason. All is possible, but you must dig into the .t4 code in more depth.

One warning about using importing scaffolding – Visual Studio threw a nasty error message when first tried to import using the EntityFramework.CodeTemplates scaffolding (see stackoverflow entry). It took a bit of finding but it turns out if you have Entity Framework Power Tools Beta 4 installed then they clash. If you have Entity Framework Power Tools installed then you need to disable it and restart Visual Studio before you can import/reverse engineer a database. I hope that gets fixed as Entity Framework Power Tools is very useful.

Note: There are two other methods to reverse engineer an existing database:

  1. EntityFramework Reverse POCO Code First Generator by Simon Hughes. This is Visual Studio extension recommended by the EF Guru, Julia Lerman, in one of her MSDN magazine articles. I haven’t tried it, but if Julia recommends it then its god.
  2. Entity Framework Power Tools Beta 4 can also reverse engineer a database. Its quicker, only two clicks, but its less controllable. I don’t suggest you use this.

1.2 Fixing a problem with how the two keys are defined in SalesOrderDetail table

The standard definition for the SalesOrderDetail table key parts are as followes

[Table("SalesLT.SalesOrderDetail")]
public partial class SalesOrderDetail
{
    [Key]
    [Column(Order = 0)]
    [DatabaseGenerated(DatabaseGeneratedOption.None)]
    public int SalesOrderID { get; set; }

    [Key]
    [Column(Order = 1)]
    public int SalesOrderDetailID { get; set; }

    //other properties left out for clarity...
}

You can see it marks the first as not database generated, but it does not mark the second as an Identity key. This caused problems when I tried to create a new SalesOrderDetail so that I could add a line item to an order. I got the SQL error:

Cannot insert explicit value for identity column in table ‘SalesOrderDetail’ when IDENTITY_INSERT is set to OFF.

That confused me for a bit, as other two-key items had worked, such as CustomerAddress. I tried a few things but as it looked like an EF error I tried telling EF that the SaledOrderDetailID was a Identity key by using the attribute [DatabaseGenerated(DatabaseGeneratedOption.Identity)]. That fixed it!

The best solution would be to edited the scaffolding again to always add that attribute to identity keys. That needed a bit of work and the demo was two days away so in the meantime I added the needed attribute using the MetadataType attribute and a ‘buddy’ class. This is a generally useful feature so I use this example to show you how to do this in the next section.

3. Adding new DataAnnotations to EF Generated classes

Being able to add attributes to properties in already generated classes is a generally useful thing to do. I needed it to fix the key problem (see 1.2 above), but you might want to add some DataAnnotations to help the UI/presentation layer such as marking properties with their datatype, e.g. [DataType(DataType.Date)]. The process for doing this is given in the Example section of this link to the MetadataType attribute. I will show you my example of adding  the missing Identity attribute.

The process requires me to add a partial class in another file (see section 3 coming later for more on this) and then add the [MetadataType(typeof(SalesOrderDetailMetaData))] attribute to the property SaledOrderDetailID in a new class, sometimes called a ‘buddy’ class . See below:

[MetadataType(typeof(SalesOrderDetailMetaData))]
public partial class SalesOrderDetail : IModifiedEntity
{
}

public class SalesOrderDetailMetaData
{
    [DatabaseGenerated(DatabaseGeneratedOption.Identity)]
    public int SalesOrderDetailID { get; set; }
}

The effect is to apply those attributes to the existing properties. That fixed my problem with EF creating new SalesOrderDetail properly and I was away.

2. What happens when the database changes?

Having sorted  the scaffolding as discussed above then just repeat step 1, ‘Creating the Entity Framework Classes from the existing database’. There are a few things you need to do before, during and after the reimport.

  1. You should remember/copy the name of the DbContext so you use the same name when you reimport. That way it will recompile properly without major name changes.
  2. Because you are using the same name as the existing DbContext you must delete the previous DbContext otherwise the reimporting process will fails. If its easier you can delete all the generated files as they are replaced anyway. That is why I suggest you put them in a separate directory with no other files added.
  3. When reimporting by default the process will add the connection string to your App.Config file again. I suggest you un-tick that otherwise you end up with lots of connection strings (minor point, but can be confusing).
  4. If you use source control (I really recommend you do) then a quick compare of the files to check what has changed is worthwhile.

3. Adding new properties or methods to the Entity classes

In my case I wanted to add some more properties or methods to the class? Clearly I can’t add properties that change the database – I would have to talk to the DBA to change the database definition and import the new database schema again. However in my case I wanted to add properties that accessed existing database properties to produce more useful output, or to have an intention revealing name, like HasSalesOrder.

You can do this because the scaffolding produces ‘partial’ classes, which means I can have another file which adds to that class. To do this it must:

  1. Have the same namespace as the generated classes
  2. The class is declared as public partial <same class name>.

I recommend you put them in a different folder to the generated files. That way they will not be overwritten by accident when you recreate the generated files (note: the namespace must be the original namespace, not that of the new folder). Below I give an example where I added  to the customer class. Ignore for now the IModifiedEntity interface (dealt with later in this article) and [Computed] attribute (dealt with the the part 2 of the article).

public partial class Customer : IModifiedEntity
{
    [Computed]
    public string FullName { get { return Title + " " + FirstName + " " + LastName + " " + Suffix; } }

    /// <summary>
    /// This is true if any sales orders. We use this to decide if a 'Customer' has actually bought anything
    /// </summary>
    [Computed]
    public bool HasSalesOrders { get { return SalesOrderHeaders.Any(); } }
}

Note that you almost certainly will want to add to the DbContext class (I did – see section 4 below). This is also defined as a partial class so you can use the same approach. Which leads me on to…

4. Dealing with properties best dealt with at the Data Layer

In the AdventureWorks database there are two properties called ‘ModifiedDate’ and ‘rowguid’. ModifiedDate needs to be updated on create or update and the rowguid needs to be added on create.

Many databases have properties like this and they are best dealt with at Data/Infrastructure layer. With EF this can be done by providing a partial class and overriding the SaveChanges() method to handle the specific issues your database needs. In the case of AdventureWorks I adding an IModifiedEntity interface to each partial class that has ModifiedDate and rowguid property.

Then I added the code below to the AdventureWorksLt2012 DbContext to provide the functionality required by this database.

public partial class AdventureWorksLt2012 : IGenericServicesDbContext
{
    /// <summary>
    /// This has been overridden to handle ModifiedDate and rowguid
    /// </summary>
    /// <returns></returns>
    public override int SaveChanges()
    {
        HandleChangeTracking();
        return base.SaveChanges();
    }

    /// <summary>
    /// This handles going through all the entities that have
    /// changed and seeing if we need to do anything.
    /// </summary>
    private void HandleChangeTracking()
    {
        foreach (var entity in ChangeTracker.Entries()
           .Where(e => e.State == EntityState.Added
               || e.State == EntityState.Modified))
        {
            UpdateTrackedEntity(entity);
        }
    }

    /// <summary>
    /// Looks at everything that has changed and
    /// applies any further action if required.
    /// </summary>
    /// <param name="entityEntry"></param>
    /// <returns></returns>
    private static void UpdateTrackedEntity(DbEntityEntry entityEntry)
    {
        var trackUpdateClass = entityEntry.Entity as IModifiedEntity;
        if (trackUpdateClass == null) return;
        trackUpdateClass.ModifiedDate = DateTime.UtcNow;
        if (entityEntry.State == EntityState.Added)
            trackUpdateClass.rowguid = Guid.NewGuid();
    }
}

The IModifiedEntity interface is really simple:

//This interface is added to all the database entities
//that have a modified date and rowGuid. Save Changes uses this
// to find entities that need the date updating, or a new rowguid added
public interface IModifiedEntity
{
    DateTime ModifiedDate { get; set; }
    Guid rowguid { get; set; }
}

5. Using SQL Store Procedures

Some databases rely on SQL Stored Procedures (SPs) for insert, update and delete of rows in a table. AdventureWorksLT2012 did not, but if you need to that EF 6 has added a neat way of linking to stored procedures. Its not trivial, but you can find good information here on how to do it.

Clearly if the database needs SPs for CUD (Create, Update and Delete) actions then you need to use them. However using EFs CUD actions is easier from the software point of view, and EFs CUD have some nice features. For instance EF has an in-memory copy of the original values and uses this for working out what has changed.

The benefit is that the EF updates are efficient – you update one property and only that cell in a row is updated. The more subtle benefit is tracking changes and handling SQL security, i.e. if you use SQL column level security (Grant/Deny) then if that property is unchanged we do not trigger a security breach. This is a bit of an esoteric feature, but I have used it and it works well.

6. Other things you could do

This is all I had to do to get EF to work with an existing database, but there are other things I have had to use in the past. Here is a quick run through of other items:

6a. Using Direct SQL commands

Sometimes it makes sense to bypass EF and use a SQL command, and EF has all the commands to allow you to do this. The EF documentation has a page on this here which gives a reasonable overview, but I recommend Julia Lerman’s book  ‘Programming Entity Framework: DbContext’ which goes into this in more detail (note: this book is very useful but it covers an earlier version of EF so misses some of the latest commands like the use of SPs in Insert, Update and Delete).

For certain types of reads SQL makes a lot of sense. For instance in my GenericSecurity library I need to read the current sql security setup (see below). I think you will agree it makes a lot of sense to do this with a direct sql read rather than defining multiple data classes just to build the command.

var allUsers = db.Database.SqlQuery<SqlUserAndRoleRow>(
@"select mp.name as UserName, rp.name as RoleName, mp.type as UserType
from sys.database_role_members drm
join sys.database_principals rp on (drm.role_principal_id = rp.principal_id)
join sys.database_principals mp on (drm.member_principal_id = mp.principal_id)
ORDER BY UserName");

For SQL commands such as create, update and delete is is less obvious, but I have used it in some cases. For these you use the SqlCommand method, see example from Microsoft below:

using (var context = new BloggingContext()) 
{ 
    context.Database.SqlCommand( 
        "UPDATE dbo.Blogs SET Name = 'Another Name' WHERE BlogId = 1"); 
}

Neither of these example had parameters, but if you did need any parameters then SqlQuery and SqlCommand methods can take parameters, which are checked to protect against a SQL injection attack. The Database.SqlQuery Method documentation shows this.

One warning on SqlCommands. Once you have run a SqlCommand then EF’s view of the database, some of which is held in memory, is out of date. If you are going to close/dispose of the DbContext straight away then that isn’t a problem. However if the command is followed by other EF accesses, read or write, then you should use the EF ‘Reload’ command to get EF back in track. See my stackoverflow answer here for more on this.

6b. SQL Transaction control

When using EF to do any database updates using the .SaveChanged() function then all the changes are done in one transaction, i.e. if one fails then none of the updates are committed. However if you are using raw SQL updates, or a combination of EF and SQL updates, you may well need these to be done in one transaction. Thankfully EF version 6 introduced commands to allow you to control transactions.

I used these commands in my EF code to work with SQL security. I wanted to execute a set of SQL commands to set up SQL Security roles and grant/deny access, but if any one failed I wanted to roll back. The code to execute a sequence of sql commands and rollback if any single command fails is given below:

using (var dbContextTransaction = db.Database.BeginTransaction())
{
    try
    {
        foreach (var text in sqlCommands)
        {
            db.Database.ExecuteSqlCommand(text);
        }
        dbContextTransaction.Commit();
    }
    catch (Exception ex)
    {
        dbContextTransaction.Rollback();
        //report the error in some way
    }
}

You can also use the same commands in a mixed SQL commands and EF commands. See this EF documentation for an example of that.

Conclusion

There were a few issues to sort out but all of them were fixable. Overall getting EF to work with an existing database was fairly straightforward, once you know how. The problem I had with multiple keys (see section 1.2) was nasty, but now I, and you, know about it we can handle it in the future.

I think the AdventureWorks Lite database is complex enough to be a challenge: with lots of relationships, composite primary keys, computed columns, nullable properties etc. Therefore getting EF to work with AdventureWorks is a good test of EFs capability to work with existing SQL databases. While the AdventureWorks Lite database did not need any raw SQL queries or Stored Procedures other projects of mine have used these, and I have mentioned some of these features at the end of the article to complete the picture.

In fact version 6 of EF added a significance amount of extra features and commands to make mixed EF/SQL access very possible. The more I dig into things the more goodies I find in EF 6. For instance EF 6 brought in Retry logic for Azure, Handling transaction commit failures, SQL transaction control, improved sharing connections between SQL and EF, plus a number of other things. Have a good look around the EF documentation – there is a lot there.

So, no need to hold back on using Entity Framework on your next project that has to work with an existing SQL database. You can use it in a major role as I did, or now you have good connection sharing just use it for the simple CRUD cases that do not need heavy T-SQL methods. Happy coding.

Now read Part 2, which looking at data to user interface side

Is the Repository pattern useful with Entity Framework? – part 2

Last Updated: July 31, 2020 | Created: September 19, 2014
Quick Summary of Part 2
Having decided that the Repository/Unit of Work pattern does not add anything to Entity Framework (EF) the question was how to tame EF to work with Domain-Driven Design principals. This post details my new approach to using EF and why it is better.

This is a series:

  1. Part 1: Analysing whether Repository pattern useful with Entity Framework
  2. Part 2: Four months on – my solution to replacing the Repository pattern (this article).
  3. UPDATE (2018): Big re-write to take into account Entity Framework Core, and further learning.

Four months ago I was reviewing the design of Spatial Modeller™ , version one and I come across people who said that the Repository/Unit of Work pattern was not useful with Microsoft’s main data access technology, Entity Framework (shortened to EF from now on).

See items marked with ‘UPDATED 2015‘ for new comments added eight month on from first article.

And ‘UPDATED 2017′ for new comments added from my book Entity Framework Core in Action, which I am writing for Manning Publishing.

At that time I wrote a the blog post called ‘Is the Repository pattern useful with Entity Framework? – part 1‘ where I looked critically about what people said. At the time I was fairly convinced, but I needed to write some real code to see how I would do things without a repository.

Well, four months on the answer is – I’m convinced. The Repository/Unit or Work pattern does NOT add anything to EF. In fact it hides some useful features that a repository has trouble using. (UPDATE 2015: eight months on I am even more convinced as I have now built some more representative applications).

However the issue is – what replaces the good bits of the repository, like handling Domain-Driven Design (DDD) which tries to treat the data persistence library as a simple in-memory collection. This means we want to hide some of EFs code, but still be able to use all of its features. It has taken me a little while to refine my approach, but here are my thoughts so far.

One size does not fit all

After some trials it became clear that three parts of the application needed to interact with the database:
  1. The User Interface (UI) which needs to display data in lots of different ways and often needs help when editing data, i.e. it might have to provide the user with a list of options using a dropdown list.
  2. The WebApi or Web Service, which is similar to the User Interface but has less need for user selection from a list.
  3. The Business Layer which needs to hide some of the EF code and make the database feel like a searchable collection.
Realising that there were different parts was a breakthrough as I could now design different ways to interface to EF to meet the different requirements. I combined the UI and the WebApi as they are very similar and then provided a different approach for the business layer.

Design 1: CRUD data access for User Interface layer and WebApi

The front-end of any application needs to do CRUD operations: that is Create, Read, Update and Delete of entries. Create and Update can be particularly difficult as it might need other data to guide the user. For instance on my example site the create/update of a Post needs the user to select the right Tags and, in my example, the Author (try http://samplemvcwebapp.net/Posts and then edit an entry). That needs extra data, which is where Data Transfer Objects (DTOs) come in.

I am a strong proponent of DTOs I have used them heavily and I find they really help decouple the data and business layer from the UI or WebApi. DTOs are also good at dealing with the issue of Aggregates, that is a class that has subclasses under it. For instance you might have a class called Order, which has collection of LineItems under it, and most likely a link to a Customer. DTOs can extract all the various parts and either ‘flatten’ them into one class, or for collections extract only the parts of the data that the user wants/is allowed to see.

The important point is to design the transfer for reads from the database so that they turn into efficient SQL requests. This needs a bit of thought, but isn’t that hard to do. By using LINQ and IQueryable<T> requests then this allows paging and filtering to be added at the last minute in the UI/WebApi end. Also adding the .Select function allows only the columns/properties you need to be read in. The benefit of all this is that the SQL request only asks for those rows and properties that the request needs, which makes the request as efficient as possible.

So an approach that uses DTO is good. The only down side is that writing the data access code for DTOs is repetitive and boring. I therefore looked at how to automate this. As I said in part 1 of this post I thought I might use T4 code generation. However in the end I pursued a route using .NET Generics. The result is an Open Source library called GenericServices which provides List, Detail, Create, Update and Delete functions on either the EF data classes themselves or more interestingly via DTOs linked to EF data classes.

Design 1: Implementation (UPDATED 2015)

Rather than describe how this all works here I have build two example web sites.

1. SampleMvcWebApp.net, which is simple application to show GenericServices in action.

2. (UPDATE 2015) Complex.SampleMvcWebApp, which is a much more realistic example of a real application. The interesting thing for me is that it only took me two weeks to write the whole application (well, plus a few days for adding supporting comments and a few tweaks to improve it). That is way faster than my previous applications and proved to me the power of GenericServices.

I have written two articles that describe the design of the Complex.SampleMvcWebApp site. They are:

(UPDATE 2015) The GenericServices Wiki now contains lots of information  and GenericService is now available on Nuget – see GenericServices on NuGet. This means you can try GenericServices yourself.


Design 2: The Business Layer (UPDATED 2015)

UPDATE: I now have a improved approach to handling business logic which you can find in the article Architecture of Business Layer working with Entity Framework (Core and v6) – revisited and in Chapter 4 of the book I am writing for Manning Publishing, Entity Framework Core in Action.

For the business layer I have taken a different approach. From a design/architecture point of view I do three things.

  1. I allow the business layer to directly access the data via Entity Framework (EF). No repository, not unit of work, just good old EF. This means that I have access to all of EFs commands and nothing is hidden from me.
  2. Problem is that EF on its own can be a bit of a handful so I tame it with ‘Query Objects’. Mainly I want to make repetitive, or complicated EF code appear as simple methods with intention revealing names. See implementation section for an example of this.
  3. In the business layer I try to treat the database as an in-memory collection which some search procedures, i.e. Query Objects. I try to not use the data access methods other than for loading or inserting new entries. Any other commands are hidden in Query Objects or other methods.

Let me explain how this works by talking about the implementation.

Design 2: Implementation

So the implementation of using EF in the business layers consists of four parts:

1. Building ‘Query Objects’ to hide some of the complex filtering/loading

UPDATE 2017: In chapter 2 of the book Entity Framework Core in Action I describe building a book listing with sorting, filtering and paging using a series query objects. You can see a live example site showing you those query objects in action.

In the first post I mentioned that people were taking about Query Objects, which is an encapsulation of particular database queries or updates. In C# Extension methods are a great tool for building Query Objects.  If you have seen LINQ fluent commands then that is how extension methods work. Below is an example from Spatial Modeller™ where it loads all the schemes who’s key is in the array parameter, with the schemes layers, members, attributes and services.

public static IQueryable<SmScheme> GetSchemesByIdIncludingAllChildClasses
    ( this SpatialModellerDb db, ICollection<int> schemeIdsToLoad)
{
    return db.SmSchemes
        .Include(x => x.Grades)
        .Include(
            x => x.Layers.Select(y => y.Members
                              .Select(z => z.Attributes)))
        .Include(
            x =>
                x.Layers.Select(y => y.Members
                               .Select(z => z.ServicesHandled)))
        .Where(
            x => schemeIdsToLoad.Contains(x.SmSchemeId));
}

This code hides the EF code needed to perform the database action inside a nicely named method. This code would be in the data/infrastructure layer, which means the EF code is more centralised in one place. I would recommend ‘s article ‘Giving Clarity to LINQ Queries by Extending Expressions‘ which has some excellent ideas and source code for building ‘query objects’.

2. Using Domain-Driven Design (DDD) principals

I really like the thinking behind DDD. It has made me a better programmer. While DDD accepts that the way data is stored/persisted will have an affect on the design (see Eric Evans book, page 159) the aim is to allow Business Layer code treat the database as an in-memory collection which some search procedures, with as little interaction as possible with the data access code.

This isn’t done just to hide the database, but mainly to ensure that the developers working in the business layer are thinking about the business problem first, not how the database works. DDD recognises that there might need to be some implementation compromises to work with the database, but the overall aim is to keep the focus on the business problems.

The other issue that effects the approach used is that the actual data saved in the database will be designed for the business layer. Therefore the business layer will use the data classes directly, as they are a good fit to the business problem.

Update after a lot more business layer work

I still use a DDD approach, but I have found that EF doesn’t really support a pure DDD design for a number of reasons. Firstly the EF database classes need to be at the data layer level, so these classes aren’t really the right place to have business logic (I have a separate layer/assembly for that). Also EF sets certain restrictions on the way collections are handled, which means you can’t use a IReadonlyCollection, which can make building a DDD root/aggregates approach where the sub classes of a DDD root can only be changed via the root hard to enforce. Because of this I still follow a DDD design approach, but some of the more stringent access rules have to be implemented simply by setting standards, not by programatically enforces rules.

Even so my business layer is able use the DDD principal of treating the database as just classes, which make the business logic much simpler. To do this a typical business pattern has three phases:

  1. Read in any data it needs for the business logic.
  2. Some pure business logic working on the data classes – the bulk of the code.
  3. An optional add/remove phase if any new data needs to be added to the database (note: updates happen automatically if you just change the database classes read in in phase 1).
3. The Business Layer never calls EF’s .SaveChanges() to persist data

The business layer should not know about persisting/updating data. I therefore design the business layer call .add() or .remove() on the EF collections, but the final update is triggered by the Service Layer after the business process returns.

This sounds subtle, but it has proved to be really useful. I go into this in more detail in my new article  Architecture of Business Layer working with Entity Framework but here are three good reasons why it helps.

  • The single big commit meant that the modelled data was either there or not – no cleaning up if the business layer had done part of the job and committed it but a later part failed.
  • The Business Layer does not have to handle database validation errors, which can occur when ‘SaveChanges’ is called. The Service Layer does that, which is better placed to deal with any errors.
  • The business layer was easier to test as it added data to the EF collections, but didn’t change the database.

Conclusion (UPDATED 2015)

Doing all this is hard work, especially building the GenericServices library, but I really do think this will speed up future development. I built the SampleMvcWebApp fairly quickly, but that was because I was developing GenericServices.

However when I built Complex.SampleMvcWebApp, which is much more representative application, it only took two weeks, which I though was fantastic. Have a look at the site – it’s fairly complex in terms of data manipulation even if I didn’t spend time on the visual design. NOTE: I have not implemented any business logic at all in the sample. If I had it would have taken a lot longer because business logic is by its very nature specific to a project.

The things I immediately noticed when swapping to the new approach are:

  • The separation of concerns is much better, with the EF code more focused to where it is needed.
  • Creating a good repository pattern is hard and I used to find myself fiddling with the repository implementation during the project as I hit new issues. That is all gone.
  • Using EF directly allows me to use the full range of commands, like load to load relationships that are not currently loaded. I now realise when I used a repository pattern I was having to tweak the code to get round things that EF had commands to solve.

The primary helper is GenericService, which as of 2015 is now released on Nuget – see GenericServices on NuGet. Without GenericService it would be a hard slog to build the Service Layer. What I noticed GenericServices did was:

  • Writing the DTOs is all about getting the right names/properties. Everything else is handled by GenericServices. So much quicker to write and more robust.
  • If there was a special case then it was easy to override one of GenericServiceDTOs virtual methods. As part of the exercise of building Complex.SampleMvcWebApp I made the method names more obvious to stop me having to look them up every time.
  • The MVC Controller code is now the same for all but the most complex situation, with just the data classes changing. This makes writing the MVC side much easier, and less error prone. I plan to create some scaffolding but currently waiting for MVC 6 (vNext) which has new scaffolding options.

I have also written a private library called GenericActions to help with calling business methods, but that is a subject of another article.

I hope that is helpful to people. Happy coding!

Analysis of Entity Framework 6 async performance

Last Updated: July 31, 2020 | Created: July 17, 2014
Quick Summary
This post is an appendix to an article I am writing for the Simple-Talk blog site. It contains the detailed results of side-by-side comparisons of sync and async versions of the  Entity Framework (EF) 6 data accesses. This shows that async EF commands are not that much  slower than normal sync commands.

I am writing a article called ‘The .NET 4.5 async/await Commands in Promise and Practice‘ for the Simple-Talk blog. In this I look at the whole area of the async/await commands for tasking.

In putting this article together I ran extensive side-by-side comparisons of normal, synchronous Entity Framework (EF) commands and the new async versions of the same commands. I found some surprising results which I think others might be interested in.

The Simple-Talk article wasn’t the right place to put all this technical detail so I have written this blog as an appendix for those of you that would like to drill-down into the detail. I think you will find the results interesting.

Overview of the test database

tagpostblog
The three classes used in database (click to enlarge).

I am building a .NET library called GenericServices (see previous blog post on this) and as the test database I have a simple blog site with three classes in it: Blog, Post and Tag. The diagram below shows the data classes and their relationships.

Most of the tests are done on the Post class, which is the more complex to update and deleting it does not cause cascade deletes. Also, for update I assume a disconnected update, i.e. as would happen on a web site, where the original Post is read and then specific properties are updated before they are written back to the database.

Test 1. Raw EF performance

The first test results I will share with you is the raw speed of the sync and async version of the EF commands. This was done inside an NUnit test running  inside Visual Studio 2013 on my development PC which has an Intel i7 processor clocked at 3.31GHz. The database is localDb running SQL server 2012. I ran 1000 tests on a database filled with 1000 entres and averaged out the time for each command.

You can see the unit test code here (I used method Perf21NCompareEfSyncAndAsync1000Ok around line 61) and the actual Ef commands here (note: GenericServices is an open-source project).

Let me list the results first and then draw some conclusions from this.

Sync (ms) Async (ms) Diff Notes
List, Direct 2.80 7.80 279% Just reads Post class
List, DTO 16.80 21.00 125% Reads Post, Blog and Tags
Create 10.40 8.80 85% Reads Tags and write Post
Update 15.70 9.70 62% Reads Post, Tags and write Post
Delete 0.90 1.10 122% Simple state update

Analysis of raw EF performance

  1. Async does not have much of an overhead
    I was surprised that the async versions were so fast. Lots of blog posts warn about async being slow, but on the simplest method, which was listing the Post class it was only 5ms slower. That, I think is impressive. However in the unit tests the context it was saving was small (see my simple-talk article to learn more about context) and it also caches the context so one off commands might take longer. I look at single reads later in the real-world section below.
  2. Some of the async commands are faster!?
    You can see that the async version of create and the update are faster (see blue percentages). Why is that? I am not totally sure, but I think it is because there are multiple database accesses (see the notes column) and I think it manages to overlap these. If someone has a better explanation then please let me know.

Test 2. Real-world performance on ASP.NET MVC5 site

Raw figures like that are interesting, but its real-world performance that matters. I have a companion MVC5 web site called SampleMvcWebApp, also open-source, which I try things out on. This has the same Blog, Post, Tag format described at the start, but with only a few entries (default 17). I have this MVC web app on different hosting environments, plus internally:

Try this yourself!
The SampleMvcWebApp MVC5 web site is live and available for you to access and play with. Its address is http://samplemvcwebapp.net/
  1. A low-cost shared site running Windows Server 2012 R2 through a UK company called WebWiz. I would recommend WebWiz (no, they are not paying me) as they seem to support the latest Microsoft environments quickly, (one of the first to support SQL 2012 in the UK) and they are relatively cheap.
  2. A Windows Azure account where I can change the level of hosting performance.
  3. Locally on my development server.

My tests are done using ab, the Apache HTTP server benchmarking tool. This allows me to load the site and also get averaged performance figures. I used the benchmark tool to read the posts list (17 entries) in two modes: a) one user trying 100 times, b) 50 users trying twice, all at the same time. The results are:

Host+ action Sync (ms) SD (ms) Async (ms) SD (ms)
WebWiz
 – List Posts ave(100), one user 90 4 95 9
 – List Posts ave(100) , 50 users 1065 450 1050 450
Azure, standard
 – List Posts ave(100), one user 110 11 120 50
 – List Posts ave(100) , 50 users 1200 500 1200 500

 

Notes:
– The Sync (ms) and Async (ms) columns are the average total time to get one list result from the web site using a synchronous and async EF method respectively.
– The SD (ms) column holds the standard deviation and shows the amount of variability of the results.

Analysis of MVC web site performance

    1. Async listing is slightly slower for one user
      There is a small difference, less than 10%, between the sync and async versions. However this must all be due to the use of async as the data is the same. It amounts to 5 to 10 milliseconds, which is about the same as we saw in the raw EF figures earlier.
    2. Using Async seems to have had no effect on the scalability of the web site.
      You will see that the figures for 50 simultaneous users is almost the same for both sync and async commands. That makes sense as the database is small and the query fairly simple and there is almost no waiting for the database (raw numbers for the database access part on small databases is 2 to 3 milliseconds).Clearly with much more complex data accesses then async would start to pull away in terms of scalability of the web site because the thread would be freed up for longer while the database is doing its work.
    3. Most of the time is spent in the MVC/Http part of the process.
      If you look at the timings for one request captured using Google Chrome’s developer tools you can see the parts of the timeline below:

listpostswebwiz

    This clearly shows what I said in the Simple-Talk article – that optimising/caching the other files, like css and JavsScript files would have far more effect than worrying about whether to use a sync or async data request.

I will end this section by saying it is quite difficult to test scalability on live web sites because they are designed to take a lot of load. You might like to look at Steven Sanderson’s video about threads and scalability which has an excellent demo http://channel9.msdn.com/Events/TechDays/Techdays-2012-the-Netherlands/2287.

Overall Conclusions

EF async data accesses have only a small overhead over the standard, sync commands. That is big news, and contrary to some of the older documentation that is out there. So feel free to use async EF commands in your next application.

Happy coding.

Alpha release of GenericServices

Last Updated: July 31, 2020 | Created: July 3, 2014
Quick Summary
This post introduces my new GenericServices framework designed to lighten the development load of building a  service layer.
This post summarises what GenericServices is about and the motivation behind it. It also provides links to code source and the example web site.

I am pleased to announce the alpha release of my new Open Source project called GenericServices available on GitHub. It will also be available on NuGet when it is has a more stable interface (watch this space).

GenericServices is a .NET class library which helps a developer build a service layer, i.e. a layer that acts as a facard/adapter between your business/data service layers and your User Interface or HTTP service.

It does this by providing standard database CRUD (Create, Read, Update and Delete) methods and a standard way of calling business methods, each with clear extension points. The aim is to cut out the standard boiler-plate code so the user only has to write the data or business specific code.

What application frameworks can GenericServices work with?

GenericServices is designed work as a service layer framework in any .NET application, such as  ASP.NET MVC, Widows Azure Web apps, etc. It assumed a disconnected state model, e.g. a web site or HTTP RESYful service where the read of data prior to update is disconnected from the update of the data.

I have also assumed a horizontal scaling model, e.g. scale by having multiple web instances,  as this is how Azure and most web sites scale. I have therefore not thought about serialisation of objects for vertical scaling, i.e. where each layer of the application are run on a separate server and remote calls are used between layers.

GenericServices uses the following .NET frameworks/systems.

  • It needs .NET 4.5 because it implements all commands in normal and the new async/await Tasking format introduced in .NET 4.5
  • It uses Entity Framework 6 (EF6) for database access, again because it supports async commands.
  • It also makes use of the open source AutoMapper library for transforming data and business classes to/from the user interface oriented DTOs.

What is the motivation behind building GenericServices?

I design and develop fairly complex analysing, modelling and data visualisation web applications (see Spatial Modeller). These require a Domain-Driven Design approach to the data and business objects, while the visualisation needs a comprehensive user interface which can include a Single Page Application (SPA) fed by a REST interface. This means there often a mismatch between the business/data layers classes and the data needed by the user interface and SPA.

My experience is that the Service Layer, plus Data Transfer Objects (DTOs), is the best way to solve mismatch. However I have found that the service layer is often filled with lots of code that is very similar, with just the data being different. Its also pretty boring to write. I therefore researched a number of approaches to handle the standard code and finally came up with a solution using C#’s Generic classes. I have therefore called it GenericServices.

Q: How would I use it? A: Look at example web site!

I have taken the trouble to produce a live example web site. This has examples of all the GenericService commands, with documentation explaining how they work – for example see the Posts Controller code explanation page.

As well as showing the use of GenericService it also contains another companion project of mine; code for executing long-running methods with progress and cancellation controls in ASP.NET MVC using SignalR.

This web site is an ASP.NET MVC5 application and is itself an open source project called SampleMvcWebApp. The source of is available on GitHub.

Feedback

While in alpha phase I suggest you leave comments here of contact me via this web site’s Contact Page. Once it is up on NuGet I will try and set up a Stack Overflow group and monitor that.

Mocking SignalR Client for better Unit Testing

Last Updated: July 31, 2020 | Created: June 10, 2014
Modal dialog showing task progress.
Modal dialog showing task progress. Uses SignalR.

Why I needed to Mock SignalR

I build geographic modelling applications and they have lots of tasks that take a long time, sometime many minutes, to run. I am currently  developing an open source framework to help speed up the development of such ASP.NET MVC applications. Therefore part of the framework I have includes modules for handling long running processes, with a progress bar, messages and user cancellation. Click on the image on the left to see a a simple example of a model panel with a green progress bar at the top and a list of sent messages as the task works its way through the calculations (well, in this case a test code so the messages aren’t that exciting) .

I have used SignalR for the communication channel between the JavaScript and the MVC5 server. I have found SignalR to be excellent and makes two-way comms really easy.

However my application is fairly complicated because of all the things that can happen, like errors, user wanting to cancel, losing connection, etc. In particular the JavaScript client uses a state machine to handle all the options, and that needs checking. For this reason I wanted to unit test both ends. (Read my blog on Unit Testing for an in-depth look at how I use Unit Testing).

The C# end was fairly straight forward to test, as it was partitioned well. However for the JavaScript end I needed to Mock the SignalR JavaScript library. I could not find anything online so I wrote something myself.

Mocking the SignalR JavaScript Client

I turns out that is wasn’t that hard to mock the SignalR Client, although I should say I don’t use the autogenerated SignalR hub scripts, but use the createHubProxy(‘HubName’) as I find that easier to manage that loading a dynamically created script. I have listed the code mocked SignalR Client code below:

//This is a mock for a small part of SignalR's javascript client.
//This code does not mock autogenerated SignalR hub scripts as the
//ActionRunner code uses the connection.createHubProxy('HubName') method,
//followed by .on or .invoke to setup the receive and send methods respectively

var mockSignalRClient = (function ($) {

    var mock = {};

    //first the items used by unit testing to see what has happened
    mock.callLog = null;
    mock.onFunctionDict = null;
    mock.doneFunc = null;
    mock.failFunc = null;
    mock.errorFunc = null;
    //This logs a string with the caller's function name and the parameters
    //you must provide the function name, but it finds the function arguments itself
    mock.logStep = function (funcName) {
        var log = funcName + '(';
        var callerArgs = arguments.callee.caller.arguments;
        for (var i = 0; i < callerArgs.length; i++) {
            log += (typeof callerArgs[i] === 'function') ? 'function, ' : callerArgs[i] + ', ';
        };
        if (callerArgs.length > 0)
            log = log.substr(0, log.length - 2);
        mock.callLog.push(log + ')');
    };
    mock.reset = function() {
        mock.callLog = [];
        mock.onFunctionDict = {}
        mock.doneFunc = null;
        mock.failFunc = null;
        mock.errorFunc = null;
    };

    //doneFail is the object returned by connection.start()
    var doneFail = {};
    doneFail.done = function (startFunc) {
        mock.logStep('connection.start.done');
        mock.doneFunc = startFunc;
        return doneFail;
    };
    doneFail.fail = function(failFunc) {
        mock.logStep('connection.start.fail');
        mock.failFunc = failFunc;
        return doneFail;
    };

    //Channel is the object returned by connection.createHubProxy
    var channel = {};
    channel.on = function (namedMessage, functionToCall) {
        mock.logStep('channel.on');
        mock.onFunctionDict[namedMessage] = functionToCall;
    };
    channel.invoke = function (actionName, actionGuid) {
        mock.logStep('channel.invoke');
    };

    //connection is the object returned by $.hubConnection
    var connection = {};
    connection.createHubProxy = function (hubName) {
        mock.logStep('connection.createHubProxy');
        return channel;
    };
    connection.error = function (errorFunc) {
        mock.logStep('connection.error');
        mock.errorFunc = errorFunc;
    };
    connection.start = function () {
        mock.logStep('connection.start');
        return doneFail;
    };
    connection.stop = function () {
        mock.logStep('connection.stop');
        return doneFail;
    };

    //now we run once the method to add the hubConnection function to jQuery
    $.hubConnection = function() {
        return connection;
    };

    //Return the mock base which has all the error feedback information in it
    return mock;

}(window.jQuery));

I think you will find most of this fairly easy to understand. Lines 8 to 34 are all the variables and methods for Unit Testing to use. The rest of the code implements the methods which mock the SignalR methods I use in my code.

How did I use this Mock SignalR?

SignalR works by adding .hubConnection() to jQuery so it was simple to make the mock SignalR client do the same (see line 71 above). My actual code checks that jQuery is present and then that $.hubConnection is defined, which ensures SignalR is loaded. Here is a piece of code from my ActionRunner.comms.js that does the initial setup to see how it uses SignalR and therefore what I needed to Mock.

//This deals with setting up the SignalR connections and events
function setupTaskChannel() {

    actionRunner.setActionState(actionStates.connectingTransient);

    actionRunner.numErrorMessages = 0;

    //Setup connection and actionChannel with the functions to call
    var connection = $.hubConnection();

    //connection.logging = true;
    actionChannel = connection.createHubProxy('ActionHub');
    setupTaskFunctions();

    //Now make sure connection errors are handled
    connection.error(function(error) {
        actionRunner.setActionState(actionStates.failedLink);
        actionRunner.reportSystemError('SignalR error: ' + error);
    });
    //and start the connection and send the start message
    connection.start()
        .done(function() {
            startAction();
        })
        .fail(function(error) {
            actionRunner.setActionState(actionStates.failedConnecting);
            actionRunner.reportSystemError('SignalR connection error: ' + error);
        });
}
Jasmine Unit Test checking what SignalR functions were called
Jasmine Unit Test checking what SignalR functions were called

Using this mocking framework

There are two main ways I use it. Firstly you get a log of each method called, which helps ensure the right methods are called.

Secondly most of the calls to SignalR link functions to certain SignalR events. By capturing these functions the unit test can call them to simulate SignalR messages, errors etc. That allows a very good level of checking.

Getting the whole picture

In case you are interested in downloading the code or seeing how it was used then here are a series of links to various bits of code. These are taken form an open source project that I am currently working on, so the code is subject to change. I have listed all the various parts of the testing. UPDATE: These links had broken because the git repository had changed – sorry about that. Now fixed.

Conclusion

As I said in my previous post ‘Unit Testing in C# and JavaScript‘ I find mocking in JavaScript very easy and helpful. The ActionRunner is complex enough to need unit testing and I found mocking the various parts I wanted to replace fairly quick to implement.

I hope this helps you with SignalR and encourages you to mock other frameworks to help you test more easily. Happy coding.

 

 

 

Reflections on Unit Testing in C# and JavaScript

Last Updated: July 31, 2020 | Created: May 31, 2014

It has been a while since I wrote my first blog post called ‘Why I’m not scared of JavaScript any more‘ and since then I have written a fair amount of JavaScript. One of the things that I have found interesting is how I use different approaches to Unit Testing in each language. I’m not saying I have the ‘best’ way of Unit Testing each language, in fact I think some areas are a bit weak, but maybe you will find my experiences and reflections useful.

If you want to get to the meat of the post skim the next three paragraphs that set the scene and jump to ‘The differences in how I Unit Test between the two languages’ section.


First, respect to Gilles Ruppert…

Before I start I should say that I was heavily helped on the JavaScript side by an excellent programmer called Gilles Ruppert. As I was new to JavaScript I sort help and my son recommended Gilles. Gilles worked for me for about 10 weeks and set up the architecture of the single page application that I needed. He also set up an excellent JavaScript development environment using Karma test runner, Mocha test framework, Expect assertions and finally Sinon stub/spy/mocks. I am building on the shoulders of a giant.

Setting the scene

have just finished the first release of Spatial Modeller™ , a medium sized ASP.NET MVC web application using ASP.NET MVC4 written in C# and a large single page application written in JavaScript using backbone with marionette.

When Gilles finished we had just under 300 JavaScript Unit Tests, a handful written by me. Now I have added a lot more code and written an additional 400+ Unit Tests, taking the JavaScript tests to over 700.

On the C# side I use NUnit unit testing framework with a small amount of Moq for mocking etc. I use Resharper for running the testing inside Visual Studio. I have used this combination for some years.

Numerically I have less Unit Tests than JavaScript, currently C# has just under 500. However each test is often much more complicated and test many items per test. The reasons for this difference in approach is one of the reasons I wrote this blog, so read on to find out why.

My style of Unit Testing

I really believe in Unit Testing and I don’t think I have a project that doesn’t used Unit Testing. However I am not a fan of Test Driven Development (TDD) as I have found that way I don’t come up with coherent design. Writing Unit Tests is not an excuse for not doing some good designing of the system first.


The differences in how I Unit Test between the two languages

After maybe 10 months of JavaScript Unit Testing (and 5+ years of C# Unit Testing) I find it really interesting that I use different styles between the two. Here are the raw  differences and I’ll make some conclusions at the end.

1. In C# my first Unit Test in the compiler

Because C# is typed the compiler gives me lots of feedback. Therefore I my refactor code before I have even run it if I see opportunities to improve the code. This doesn’t happen with JavaScript, partly because JavaScript is not typed in the same way so tools like JSHint cannot give such comprehensive feedback.

(hint: In Visual Studio you can install Web Essentials which will then run JSHint and JSCS when a JavaScript file is closed. It may not be as good and the C# compiler at spotting things, but it can help spot silly mistakes.)

This means I am willing to write more C# code before I Unit Test than I would with JavaScript.

2. Large JavaScript is much easier to partition than C#

The ‘up’ side of not so tight checking in JavaScript is much easier to build in smaller chunks. The ‘interface’ between these parts is often an object which, by the nature of JavaScript, is not directly linked between modules. As well as making Mocking really easy it seems to help me think about each part separately.

In C# I use interfaces and layer to separate chunks of code, but for something complex  with five+ significant classes that I tend to think of them as a whole. The typed nature makes them more ‘linked’ than the JavaScript equivalent and mocking is harder.

3. The unit testing frameworks make a BIG difference

I have found that test frameworks like Mocha and Jasmine have some featured that encourage small tests. This is because these frameworks support ‘nested setups’, which other frameworks like NUnit don’t have. Let me explain what ‘nested setups’ are and why they encourage smaller, one item tests.

For most tests we need to setup some instances or data before we run the test. In NUnit we can run a setup method a) once at the start of the test class and/or b) once before each test is run. This is fine, but I find that I often need to run something specific before each test, which in NUnit you to at the beginning of the specific test. See example below where I run a ‘once at start’ method called ‘SetUpFixture’ (lines 1 to 6) and then an additional setup phase inside the specific test ‘CheckValidateTagError’ (line 14).

[TestFixtureSetUp]
public void SetUpFixture()
{
    using (var db = new SampleWebAppDb())
        DataLayerInitialise.ResetDatabaseToTestData(db, TestDataSelection.Simple);
}

[Test]
public void CheckValidateTagError()
{
    using (var db = new SampleWebAppDb())
    {
        //SETUP
        var existingTag = db.Tags.First();

        //ATTEMPT
        var dupTag = new Tag {Name = "duplicate slug", Slug = existingTag.Slug};
        db.Tags.Add(dupTag);
        var status = db.SaveChangesWithValidation();

        //VERIFY
        status.IsValid.ShouldEqual(false);
        status.Errors.Count.ShouldEqual(1);
        status.Errors[0].ErrorMessage.ShouldEqual("The Slug on tag 'duplicate slug' must be unique.");
    }
}

This fine, but I find that the individual setups can become quite long. I then have three choices if I have multiple things to check: a) duplicate the individual setup code in each test (Not good for DRY), b) write a small helper method which encapsulates the setup code (takes time and hides the setup), or c) test multiple items in the one test, which is what I have done above.

Both the Mocha and Jasmine JavaScript testing frameworks allow nesting of the setups, so in my last example I could have an outer group with the ‘SetUpFixture’ in it and then a small nested group, with a setup just for CheckValidateTagError, and then three separate tests for the three parts.

Here is an example from my actual code with some nested test groups, which Mocha does with the ‘describe’ command:

... various require code

describe('views/InfoPanel', function () {
    afterEach(function () {
        delete this.model;
        delete this.view;
    });

    describe('single item to display, no grades', function () {
        beforeEach(function () {
            this.layerInfo = helpers.loadFixture('/api/schemes/1').layerInfos[0];
            this.members = helpers.loadFixture('/api/layers/1/members');
            this.model = new InfoPanelModel({ selectedMembers: this.Members });
            this.view = new InfoPanelView({ model: this.model });
        });
        afterEach(function () {
            delete this.layerInfo;
            delete this.layer;
            delete this.members;
        });

        it('should be a Backbone.View', function () {
            expect(this.view).to.be.a(Backbone.View);
        });

        describe('after showing view', function () {
            beforeEach(function () {
                this.mainRegion.show(this.view);
            });

            it('should have something on view', function () {
                expect(this.view.$el).to.have.length(1);
            });

            it('title in window header should start with layer name', function () {
                ... etc.
            });

You can see the first setup, called ‘beforeEach’ on lines 10 to 15 and the the nested ‘beforeEach’ on lines 27 to 29. This works really well and is the reason why my JavaScript Unit Tests test almost always check one item per test. I really like Mocha’s approach and miss it in NUnit.

4. Mocking databases is hard!

One thing that definitely affects me is that the database side is hard to mock. I have used a number of ORMs, and direct T-SQL, and there isn’t a nice way to replace the real code with something for testing that will catch the same errors. Foreign keys, relational fixed etc. etc. is realy hard to mock (just have a look at my post Updating a many to many relationship in entity framework to see the complications that arise).

This means some of by Unit Tests in C# are more integration tests as I use the real database layer, but with a test database loaded with test data. There is a cost to that, but my experience is anything else lets errors through.

JavaScript using backbone.js does access data, but in a more controlled way and Gilles set up the classic ‘fixtures’ and filled it with test data created by the C# application. That makes JavaScript testing much easier, but only because the server is dealing with the complex parts.

5. PS: You must be able to debug inside Unit Tests

It goes without saying that Unit Tests are for finding bugs before you use the code in the real application. Therefore you MUST be able to debug code, with breakpoints and variable inspection etc., when you are running a unit test.

I mention this only because Resharper, which I find really useful, is very bad in this area when unit testing JavaScript. Resharper makes running unit tests really easy, especially on JavaScript using QUnit or Jasmine. However I have found it almost impossible to use it to debug Jasmine tests, and I assume QUnit tests, when running in the Resharper harness.

However the Mocha JavaScript test environment that Gilles set up is excellent on that score. It allows running with connected browser, use of debugger; statement to cause pauses when the debugger is open and the .only() statement to allow only certain tests to be run.

How this affects my Unit Testing

  1. My JavaScript Unit Tests are shorter and just test one item per test.
  2. I write new Unit Test groups more often in JavaScript than in C#.
  3. My C# Unit Tests are often more complex and often test multiple items per test. This is because:
    1. The setup code is complicated or time consuming.
    2. NUnit does not have nested setups like Mocha does
      (see The unit testing frameworks make a BIG difference above).
  4. I often write more C# code before writing Unit Tests as the the compiler catches a lot of ‘silly’ mistakes in C# which can get through in JavaScript.
  5. C# sees more ‘linked together’ than JavaScript, which affects mocking and designing.

I know that testing multiple items in one test, point 3 above, is often seen as ‘wrong’ by some, but cost of both code to set it up and time to run the test are all factors I have weighted up in coming to my current style.

I have more C# code than JavaScript code by maybe 3 to 1. My 500 C# Unit Tests take nearly 2 minutes to run while my 700 JavaScript Unit Tests take 30 seconds.

Conclusion

I like both C# and JavaScript. They each have their strengths and weaknesses. I also could not work a project without the feedback that Unit Tests give me on the quality of the software.

With all of this in mind it is worth thinking about how you Unit Test in each language. The language, the need, the project and the environment will change the way Unit Tests can and should be approached. Happy programming.

Updating a many to many relationship in entity framework

Last Updated: July 31, 2020 | Created: May 22, 2014
Quick Summary
I found conflicting information on how to update many-to-many relationships in entity framework. I therefore decided to research the problem properly. This post shares with you my research.
I believe I now understand how entity framework works with automatic many-to-many relationships and how to implement my own linking table if I need to.

I had problems working out the best way to update a many to many relationship in Entity Framework (referred to as EF). I ended up doing some research and I thought others might like to see what I found out.

Firstly the environment I am working in:

  • I am using EF 6.1 in Code First mode. I am pretty sure what I have found applies across the board, but I have only tested on EF 6.1
  • I am working in a web application (MVC 5) so I have to handle disconnected classes.

If you just want the quick answer then goto the Part 1 Conclusion. The Conclusions also have a link to a live web site where you can see a many-to-many relationship used for real (with access to the source too). However its worth reading some of the examples so you understand why it works.

UPDATE 2017: I have written a new version of this article for Entity Framework Core (EF Core) to celebrate the release of my book Entity Framework Core in Action.

EF Core does NOT work the same way as EF6.x and the new article provides information on how to handle many-to-many relationships in EF Core.


Part 1: Using EF’s automatic linking of many-to-many relationships

In this example I have simply linked that tag and the post by having ICollection<Tag> Tags in the Post class and ICollection<Post> Posts in the Tag class. See navigation properties in Tag and Post in diagram below:

Tag, Post and Blog entity framework classes (blog dimmed as not important)
Tag, Post and Blog entity framework classes

As explained in the Microsoft tutorial EF will create a linking table called TagPosts or PostTags. This link table, which you never see, provides the many to many linking. This way a post can have none to many tags and tags can be in none to many posts.

So the action I want to perform is to change the tags on a post. Sounds simple, and it is in this example if you watch out for a few things.

First answer: update many-to-many relationship in connected state

Below are two unit Unit Tests which doesn’t need to worry about disconnected classes. These are good starting points as its important to know how EF does the simplest case. The first test adds a tag to a post while the second one replaces the current tags with a new set. Note that the first post starts with two tags; ‘good’ and ‘ugly’.

[Test]
public void Check25UpdatePostToAddTagOk()
{
    using (var db = new SampleWebAppDb())
    {
        //SETUP
        var badTag = db.Tags.Single(x => x.Slug == "bad");
        var firstPost = db.Posts.First();

        //ATTEMPT
        db.Entry(firstPost).Collection( x => x.Tags).Load();
        firstPost.Tags.Add( badTag);
        db.SaveChanges();

        //VERIFY
        firstPost = db.Blogs.Include(x => x.Posts.Select(y => y.Tags)).First()
                            .Posts.First();
        firstPost.Tags.Count.ShouldEqual(3);
    }
}

[Test]
public void Check26ReplaceTagsOk()
{
    using (var db = new SampleWebAppDb())
    {
        //SETUP
        var firstPost = db.Posts.First();
        var tagsNotInFirstPostTracked =
             db.Tags.Where(x => x.Posts.All(y => y.PostId != firstPost.PostId))
                    .ToList();

        //ATTEMPT
        db.Entry(firstPost).Collection(x => x.Tags).Load();
        firstPost.Tags = tagsNotInFirstPostTracked;
        db.SaveChanges();

        //VERIFY
        firstPost = db.Blogs.Include(x => x.Posts.Select(y => y.Tags)).First()
                            .Posts.First();
        firstPost.Tags.Count.ShouldEqual(1);
    }
}

Now, you should see the two important lines (line 11 and 34) which loads the current Tags in the post. This is really important as it loads the post’s current tags so that EF can track when they are changed. There are a number of alternatives to using .Load(), that would work.

  1. Add the appropriate .Include() when loading the posts, e.g.
    var firstPost = db.Posts.Include( post => post.Tags).First()
  2. Make the Tags  property in the Post class virtual, e.g.
    public virtual Collection<Tag> Tags { get; set; }

Having used .Load(), .Include() or a virtual property then EF tracks the data and then does all the work to remove the TagLinks rows. This is very clever and very useful.

I wanted to really prove to myself that my findings were correct out so I wrote another Unit Test to test the failure case. The unit test below shows conclusively that if you don’t load the current Tags it gets the wrong result. As I said earlier the first post started with the ‘good’ and ‘ugly’ tags and should have ended up with ONLY the ‘bad’ tag. However the Unit Test shows it ended up with all three.

[Test]
public void Check05ReplaceAllTagsNoIncludeBad()
{
    using (var db = new MyDbContext())
    {
        //SETUP
        var snap = new DbSnapShot(db);
        var firstPost = db.Posts.First();
        var badTag = db.Tags.SingleOrDefault(x => x.Slug == "bad");

        //ATTEMPT
        firstPost.Tags = new List { badTag };
        db.SaveChanges();

        //VERIFY
        snap.CheckSnapShot(db, 0, 1);
        var readPost = db.Posts.Include(x => x.Tags)
                         .SingleOrDefault(x => x.PostId == firstPost.PostId);
        CollectionAssert.AreEquivalent(new[] { "good", "bad", "ugly" },
                                       readPost.Tags.Select(x => x.Slug));
    }
}

As you can see from the above example it ended up with all three, which is not the right answer.

Second answer: update many-to-many relationship in disconnected state

When working with a web application like MVC an update is done in two stages. Firstly the current data is sent to the user who updates it. Secondly when the user presses submit the new data is sent back, but its now disconnected from the database, i.e. EF is not tracking it. This makes for a slightly more complicated case, but still fairly easy to handle.

Below is my method for updating the posts. In this case I have filled a MultiSelectList with all the tags and it returns the ids of the tags that the user has chosen. I should also point out I use the same method for create and update, hence the test on line 5 to see if I need to load the current tags collection.

private void ChangeTagsBasedOnMultiSelectList(SampleWebAppDb db, Post post)
{
   var requiredTagIds = UserChosenTags.GetFinalSelectionAsInts();

   if (post.PostId != 0)
       //This is an update so we need to load the tags
       db.Entry(post).Collection(p => p.Tags).Load();

   var newTagsForPost = db.Tags
                          .Where(x => requiredTagIds.Contains(x.TagId)).ToList();
   post.Tags = newTagsForPost;
}

The important thing is that I loaded new tags from the database so they are tracked.

Conclusion from part 1

If you want to update an EF provided many-to-many link then:

  1. Pick one end of a many-to-many relationship to update. EF will sort out the other end for you.
  2. Make sure that the collection you will change is loaded, either by putting virtual on the property, using .Include() in the initial load or using .Load() later to get the collection.
    Note: .Include() is the best performing of the three as it means the data is loaded in the initial SQL command. The other two, .Load() and virtual, require a second SQL access.
  3. Make sure the any new items, e.g. Tags in my example, are loaded as a tracked entity.
    This is normal case, but in the disconnected state, i.e. in a web app, that might not be true (see examples above).
  4. Call EF’s .SaveChanges() to save the changes.

Because you have tracked entities then EF’s change tracking will notice it and sort out the adding or removing of rows in the hidden link table for you.

Live web site with many-to-many relationship update

As part of my GenericServices open-source project I have build two example web sites. One of them has a simple list of blog Posts, each of which has one or many Tags. The Posts and Tags are linked by a many-to-many table. This web site is live and you can try it yourself.

  • Edit a post at http://samplemvcwebapp.net/Posts and change the tags – Note: if accessed from desktop then you need to use control-click to select multiple tags.
  • You can also see the code that updates this many-to-many relationship via the open-source project SampleMvcWebApp – see the code right at the end of the DetailPostDto.cs class in the method ChangeTagsBasedOnMultiSelectList.

Part 2: Using your own link table for many-to-many relationships

There are cases when you want to have your own link table, possibly because you want to include some other property in the link table. Below is a case where I have simply created my own link table with no extra properties. As you can see the PostTagLink table has a row for every Tag / Post link just like the hidden table that EF produced in the first instance. However now that we produced this we need to keep it up to date.

Tag, TagPostLink and Post classes
Tag, PostTagLink and Post classes

So let us carry out the same type of Unit Test (connected) code and the MVC (disconnected) code we did in the first Example.

First answer: update many-to-many relationship in connected state

The two unit tests below now show that we need to manipulate our PostTagLink table entries, not the navigation properties in the Post. After we have saved the changes the Post’s AllocatedTags list will reflect the changes through EF’s relational fixup done on any tracked classes.

[Test]
public void Check25UpdatePostToAddTagOk()
{
    using (var db = new SampleWebAppDb())
    {
        //SETUP
        var badTag = db.Tags.Single(x => x.Slug == "bad");
        var firstPost = db.Posts.First();

        //ATTEMPT
        db.PostTagLinks.Add(new PostTagLink { InPost = firstPost, HasTag = badTag });
        db.SaveChanges();

        //VERIFY
        firstPost = db.Posts.Include(x => x.AllocatedTags).First();
        firstPost.AllocatedTags.Count.ShouldEqual(3);
    }
}

[Test]
public void Check26UpdatePostToRemoveTagOk()
{
    using (var db = new SampleWebAppDb())
    {
        //SETUP
        var firstPost = db.Posts.First();
        var postTagLinksToRemove =
             db.PostTagLinks.First(x => x.PostId == firstPost.PostId);

        //ATTEMPT
        db.PostTagLinks.Remove(postTagLinksToRemove);
        db.SaveChanges();

        //VERIFY
        firstPost = db.Posts.Include(x => x.AllocatedTags).First();
        firstPost.AllocatedTags.Count.ShouldEqual(1);
    }
}

I think the code speaks for itself, i.e. you add or remove rows from the PostTagLinks table to change the links.

Second answer: update many-to-many relationship in disconnected state

Just like the first example when using MVC I have filled a MultiSelectList with all the tags and it returns the ids of the tags that the user has chosen. So now I need to add/remove rows from the PostTagLinks table. However I do try not to change links that are still needed, hence I produce tagLinksToDelete and tagLinksToAdd as its more efficient.

private void ChangeTagsBasedOnMultiSelectList(SampleWebAppDb db, Post post)
{
    var requiredTagIds = UserChosenTags.GetFinalSelectionAsInts();

    var tagLinksToDelete =
        db.PostTagLinks.Where(x => !requiredTagIds.Contains(x.TagId) && x.PostId == PostId).ToList();
    var tagLinksToAdd = requiredTagIds
        .Where(x => !db.PostTagLinks.Any(y => y.TagId == x && y.PostId == PostId))
        .Select(z => new PostTagLink {InPost = post, HasTag = db.Tags.Find(z)}).ToList();

    //We get the PostTagLinks entries right, which is what EF needs
    tagLinksToDelete.ForEach(x => db.PostTagLinks.Remove(x));
    tagLinksToAdd.ForEach(x => db.PostTagLinks.Add(x));
    //********************************************************************
    //If using EF 6 you could use the more efficent RemoveRange, e.g.
    //db.PostTagLinks.RemoveRange(tagLinksToDelete);
    //db.PostTagLinks.AddRange(tagLinksToAdd);
    //********************************************************************
}

Conclusion from part 2

If you have your own link table for handling many-to-many relationships you need to

  1. Add or remove entries from you link table, in my case called PostTagLinks.
  2. Make sure the any new items, e.g. Tag, added to your link table is loaded as a tracked entity.
  3. Call EF’s .SaveChanges() to persist the changes.

Well, that took longer than I expect to write the blog, but I hope it helps others in really understanding what is going on underneath EF’s many-to-many relationships. Certainly I now feel much more confident on the topic.

Additional note: You will see I use EF commands directly and do not use a repository or UnitOfWork pattern when accessing the database. You might like to read my post on ‘Is the Repository pattern useful with Entity Framework?‘ as to why I do that.

Is the Repository pattern useful with Entity Framework?

Last Updated: July 31, 2020 | Created: May 10, 2014
Quick Summary
This is, hopefully, a critical review of whether the repository and UnitOfWork pattern is still useful with the modern implementations of Microsoft’s Entity Framework ORM. I look at the reasons why people are suggesting the repository pattern is not useful and compare this with my own experience of using the repository and UnitOfWork over many years.

This is a series:

  1. Part 1: Analysing whether Repository pattern useful with Entity Framework (this article).
  2. Part 2: Four months on – my solution to replacing the Repository pattern.
  3. UPDATE (2018): Big re-write to take into account Entity Framework Core, and further learning.

I have just finished the first release of Spatial Modeller™ , a medium sized ASP.NET MVC web application. I am now undertaking a critical review of its design and implementation to see what I could improve in V2. The design of Spatial Modeller™ is a fairly standard four layer architecture.

Four layer design drawing
Software design of Spatial Modeller Web app

I have use the repository pattern and UnitOfWork pattern over many years, even back in the ADO.NET days. In Spatial Modeller™ I think I have perfected the design and use of these patterns and I am quite pleased with how it helps the overall design.

However we don’t learn unless we hear from others that have a counter view, so let me try and summarise the arguments I have seen against the repository/Unit of Work pattern.

What people are saying against the repository pattern

In researching as part of my review of the current Spatial Modeller™ design I found some blog posts that make a compelling case for ditching the repository. The most cogent and well thought-out post of this kind is ‘Repositories On Top UnitOfWork Are Not a Good Idea’. Rob Conery’s main point is that the repository & UnitOfWork just duplicates what Entity Framework (EF) DbContext give you anyway, so why hide a perfectly good framework behind a façade that adds no value. What Rob calls ‘this over-abstraction silliness’.

Another blog is ‘Why Entity Framework renders the Repository pattern obsolete’. In this Isaac Abraham adds that repository doesn’t make testing any easier, which is one thing it was supposed to do.

I should also say I found another blog, ‘Say No to the Repository Pattern in your DAL’ which, says that using a repository removes access to Linq querying, ability to include/prefetch or aync support (EF 6). However this not true if your repository passes IQueryable items as this allows all these features.

So, are they right?

Reviewing my use of repository and UnitOfWork patterns

I have been using repositories for a while now and my reflection is that when the ORMs weren’t so good then I really needed repositories.

I build a geographic modelling application for a project to improve HIV/AIDS testing in South Africa. I used LINQ SQL, but because it didn’t support spatial parts I had to write a lot of T-SQL stored procedures and use ADO.NET to code access the spatial values. The database code was therefore a pretty complicated mix of technologies and a repository pattern acted as a good Façade to make it look seamless. I definitely think the software design was helped by the repository pattern.

However EF has come a long way since then. Spatial Modeller™ used EF 5, which was the first version to support spatial types (and enums, which is nice). I used the repository pattern and they have worked well, but I don’t think it added as much value as in the South Africa project as the EF code was pretty clean. Therefore I sympathise with the sentiments of Rob etc.

My views on the pros and cons of repository and UnitOfWork patterns

Let me try and review the pros/cons of the repository/UnitOfWork patterns in as even-handed way as I can. Here are my views.

Benefits of repository and UnitOfWork patterns (with best first)

  1. Domain-Driven Design: I find well designed repositories can present a much more domain specific view of the database. Commands like GetAllMembersInLayer make a lot more sense than a complex linq command. I think this is a significant advantage. However Rob Conery’s post suggests another solution using Command/Query Objects and references an excellent post by Jimmy Bogard called ‘Favor query objects over repositories’.
  2. Aggregation: I have some quite complex geographic data consisting of six or seven closely interrelated classes. These are treated as one entity, with one repository accessing them as a group. This hides complexity and I really like this.
  3. Security of data: one problem with EF is if you load data normally and then change it by accident it will update the database on the next SaveChanges. My repositories have are very specific what it returns, with GetTracked and GetUntracked versions of commands. And for things like audit trails I only ever return them as untracked. (see box below right if you not clear on what EF tracking is).
  4. Hiding complex T- SQL commands: As I described before sometimes database accesses need a more sophisticated command that needs T-SQL. My view these should be only in one place to help maintenance. Again I should point out that Rob Conery’s post Command/Query Objects (see item 1) could also handle this.
Not sure what EF Tracked is?
A normal EF database command using DbContext returns ‘attached’ classes. When SaveChanges() is called it checks these attached classes and writes any changed data back to the database. You need to append AsNoTracking() to the linq command if you want a read-only copy. Untracked classes are also slightly faster to load.

Non-benefits of repository and UnitOfWork patterns are:

  1. More code: Repositories and UnitOfWork is more code that needs developing, maintaining and testing.
  2. Testing: I totally agree with Isaac Abraham. Repositories are no easier to mock than IDbSet. I have tried on many occasions to mock the database, but with complex, interrelated classes it is really hard to get right. I now use EF and a test database in my unit tests for most of the complex models. It’s a bit slower but all my mocks could not handle the relational fixup that EF does.

Conclusion

I have to say I think Command/Query Objects mentioned by Rob Conery and described in detail in Jimmy Bogard’s post called ‘Favor query objects over repositories’ have a lot going for them. While the repository/UnitOfWork pattern has served me well maybe EF has progressed enough to use it directly, with the help of Command/Query Objects to make access more Domain-Driven in nature.

What I do in a case like this is build a small app to try out a new approach. I ensure that the app has a few complex things that I know have proved a problem in the past, but small enough to not take too long. This fits in well with me reviewing the design of Spatial Modeller™ before starting on version 2. I already have a small app looking at using t4 code generation to produce some of the boiler plate code and I will extend the app to try a non-repository approach. Should be interesting.

UPDATE – 4 months & 8 months later
Read the second part of this series after four months of development and my conclusions on how to replace the Repository/Unit of Work pattern with new ways of working.  Quite a journey, but I think a useful development. Have a read and see what you think. PS. There is a live example web site showing some of the approaches which you can play with and access the code.
Happy coding!

Why I’m not scared of JavaScript any more

Last Updated: July 31, 2020 | Created: April 10, 2013

My recent programming experience has been in C# and .NET, mainly on windows apps to support my business. However I needed to move to a web-based application, especially for our work in Africa.

The problem was the only bits of JavaScript I had seen up to this point was a few lines to say handle a button click. It looked very unstructured and, for a person used to OOP and C#,  rather nasty. So, what do I do in these cases… I bought a book, or in this case I bought two books. The effect was magic.

javascript design patterns bookBook 1: Learning JavaScript Design Patterns

The first book I read was Learning JavaScript Design Patterns by Addy Osmani. I bought this because a) it was on design patterns and b) it was very recent. The last is important because JavaScript is developing at such a fast pace.

This book was just what I needed. I opened my eyes to how to Construct JavaScript cleanly and provide good structure to my programming. It made me much more confident of how to write well structured JavaScript.

I really recommend this book to anyone who is a programmer but doesn’t really know about the new ways of  building JavaScript programs.

effective javascript bookBook2: Effective JavaScript

However I knew that a book on design patterns wouldn’t tell me the nitty gritty of the language. I already had the book JavaScript: The Good Parts by Douglas Crockford, but that hadn’t helped me to get inside the thinking of JavaScript.

I found a recent book called Effective JavaScript by David Herman. I have two Effective C# books and they really get into the depths of C#, so I got the JavaScript version. I was not disappointed.

Where the Design Pattern book gave me the overview the Effective JavaScript book took me deep into the inner workings of JavaScript. Also, from skim reading it to get a good idea of what to do and, more importantly, what NOT to do. This is now my bible of good coding standards in JavaScript.

Did the books help?

So, did reading these books make me a great JavaScript programmer? Of course not. But I was confident enough to design a fairly complex geospatial data visualisation system in JavaScript to work with OpenLayers.

I then when on to write the first basic module and some constructors to do the initial display of data on a map. They have a nice separation of concerns and when I wanted to refactor something it was nice and easy. It also came together quite easily with few bugs, although I really miss not having Unit Tests (that is another blog post to come!) Its likely I will get a JavaScript Guru to build the proper system, but at least I feel confident that I know what is going on now.

Maybe in a future post I will write why I actually LIKE JavaScript now because there are some bits that are really nice. But more importantly JavaScript is now the primary (only?) way ahead for responsive web design.