An in-depth study of Cosmos DB and the EF Core 3.0 database provider

Last Updated: November 8, 2019 | Created: November 4, 2019

This article looks at the capabilities of Cosmos DB database when used via the new EF Core 3.0 database provider. It looks at various differences between Cosmos DB, which is a NoSQL database, and a SQL database. It also looks at the limitations when using the EF Core 3.0 Cosmos DB provider.

I have been waiting for this release for a while because I think both SQL and NoSQL database have a role in building high performance applications. I have a CQRS approach which uses both types of databases and I wanted to update it (see section “Building something more complex with Cosmos DB” for more info). But when I started using Cosmos DB and the new EF Core database provider, I found things were more different than I expected!

The articles forms part of the articles I am writing to update my book. “Entity Framework Core in Action”, with the changes in EF Core 3. The list of articles in this series is:

NOTE: If you not sure of the differences between SQL (relational) and NoSQL databases I recommend googling “sql vs nosql difference” and read some of the articles. I found this article gave a good, high-level list of differences at the start, but the rest of it was a bit out of date.

TL;DR; – summary

A quick introduction to using EF Core’s Cosmos DB provider

I thought I would start with a simple example that demonstrates how to use the Cosmos DB database provider in EF Core 3. In this example I have a Book (my favourite example) with some reviews. There are the two classes and the DbContext.

public class CosmosBook
{
    public int CosmosBookId { get; set; }
    public string Title { get; set; }
    public double Price { get; set; }
    public DateTime PublishedDate { get; set; }

    //----------------------------------
    //relationships 

    public ICollection<CosmosReview> Reviews { get; set; }
}
[Owned]
public class CosmosReview
{
    public string VoterName { get; set; }
    public int NumStars { get; set; }
    public string Comment { get; set; }
}
public class CosmosDbContext : DbContext
{
    public DbSet<CosmosBook> Books { get; set; }

    public CosmosDbContext(DbContextOptions<CosmosDbContext> options)
        : base(options) { }
}

If you have used SQL databases with EF Core, then the code above should be very familiar, as its pretty much the same as a SQL database. The unit test below is also to a SQL unit tes, but I point out a few things at the end that are different for Cosmos DB.

[Fact]
public async Task TestAddCosmosBookWithReviewsOk()
{
    //SETUP
    var options = this.GetCosmosDbToEmulatorOptions<CosmosDbContext>();
    using var context = new CosmosDbContext(options);
    await context.Database.EnsureDeletedAsync();
    await context.Database.EnsureCreatedAsync();

    //ATTEMPT
    var cBook = new CosmosBook
    {
        CosmosBookId = 1,      //NOTE: You have to provide a key value!
        Title = "Book Title",
        PublishedDate = new DateTime(2019, 1,1),
        Reviews = new List<CosmosReview>
        {
            new CosmosReview{Comment = "Great!", NumStars = 5, VoterName = "Reviewer1"},
            new CosmosReview{Comment = "Hated it", NumStars = 1, VoterName = "Reviewer2"}
        }
    };
    context.Add(cBook);
    await context.SaveChangesAsync();

    //VERIFY
    (await context.Books.FindAsync(1)).Reviews.Count.ShouldEqual(2);
}

Here are some comments on certain lines that don’t follow what I would do with a SQL database.

  • Line 5: The GetCosmosDbToEmulatorOptions method comes from my EfCore.TestSupport library and sets up the Cosmos DB connection to the Azure Cosmos DB Emulator – see the “Unit Testing Cosmos DB” section at the end of this article.
  • Lines 6 and 7: This creates an empty database so that my new entry doesn’t clash with any exsiting data.
  • Line 13: This is the first big thing with Cosmos DB – it won’t create unique primary keys. You have to provide a unique key, so most people use GUIDs.
  • Lines 23 and 26: Notice I use async versions of the EF Core commands. There is a warning to only use async methods in the EF Core Cosmos DB documentation.

And finally, here is what is placed in the Cosmos DB database:

{
    "CosmosBookId": 1,
    "Discriminator": "CosmosBook",
    "Price": 0,
    "PublishedDate": "2019-01-01T00:00:00",
    "Title": "Book Title",
    "id": "CosmosBook|1",
    "Reviews": [
        {
            "Comment": "Great!",
            "Discriminator": "CosmosReview",
            "NumStars": 5,
            "VoterName": "Reviewer1"
        },
        {
            "Comment": "Hated it",
            "Discriminator": "CosmosReview",
            "NumStars": 1,
            "VoterName": "Reviewer2"
        }
    ],
    "_rid": "+idXAO3Zmd0BAAAAAAAAAA==",
    "_self": "dbs/+idXAA==/colls/+idXAO3Zmd0=/docs/+idXAO3Zmd0BAAAAAAAAAA==/",
    "_etag": "\"00000000-0000-0000-8fc9-8720a1f101d5\"",
    "_attachments": "attachments/",
    "_ts": 1572512379
}

Notice that there are quite a few extra lines of data that EF Core/Cosmos DB adds to make this all work. Have a look at this link for information on what these extra properties do.

Building something more complex with Cosmos DB

There are plenty of simple examples of using the EF Core Cosmos DB database provider, but in my experience its not until you try and build a real application that you hit the problems. In my book, “Entity Framework Core in Action” I built a CQRS architecture database in chapter 14 using both a SQL and NoSQL database and as Cosmos DB wasn’t available I used another NoSQL database, RavenDB. I did this to get better read performance on my example book sales web site.

I wanted to redo this two-database CQRS architecture using the EF Core’s Cosmos DB provider. I had a go with a pre-release Cosmos DB provider in EF Core 2.2, but the EF Core 2.2. Cosmos DB provider had (big) limitations. Once EF Core 3 came out I started rebuilding the two-database CQRS architecture with its stable release of the Cosmos DB provider.

NOTE: The new version is in the master branch of the EfCoreSqlAndCosmos repo, with the older version in the branch called NetCore2_2Version.

This example application, called EfCoreSqlAndCosmos, is a site selling books, with various sorting, filtering, and paging features. I have designed the application so that I can compare the performance of a SQL Server database against a Cosmos DB database, both being queries via EF Core 3. Here is a picture of the application to give you an idea of what it looks like – you swap between the SQL Books page (shown) and the NoSQL Books page accessed by the “Books (NoSQL)” menu item.

NOTE: This code is open source and is available on GitHub. You can clone the repo and run it locally. It uses localdb for the SQL Server database and Azure Cosmos DB Emulator for the local Cosmos DB database.

What I found when I rebuilt my EfCoreSqlAndCosmos application

It turns out both the Cosmos DB and EF Core Cosmos DB provider have some changes/limitations over what I am used to with a SQL (relational) database. Some of the changes are because Cosmos DB database has a different focus – its great at scalability and performance but poor when it comes to complex queries (that’s true in most NoSQL databases too). But also, the EF Core 3.0 Cosmos DB provider has quite a few limitations that impacted what I could do.

This article has therefore morphed into a look at what you can, and can’t do with the EF Core 3 Cosmos DB provider, using my book app as an example. What I’m going to do is walk you thought the obstacles I encounter on trying to build my book app and explain what was the problem and how I got around it.

Issue #1: No unique key generation or constraints in Cosmos DB

In the initial example I showed you that Cosmos DB doesn’t create primary key in the way the SQL databases can do. That means you are likely to use something like a GUID, or GUID string, if you want to add to a Cosmos DB.

You also don’t get the sort of checking of primary/foreign keys that you have in SQL databases. For instance, if I add a book with the same primary key, I don’t get a constraint error from Cosmos DB, it just doesn’t save your new book.

My TestAddCosmosBookAddSameKey unit test shows that if you Add (i.e. try to create) a new entry with the same key as an existing entry in the database then the Add is ignored. You don’t get an exception and the only indications of that are:

  1. The number of changes that SaveChangesAsync returns doesn’t include that failed Add.
  2. The EF Core State of the added class turns to “Unchanged” after the SaveChangesAsync.

I quite like SQL constraints because it ensures that my database keeps is referential integrity, which means when a change is applied to a SQL database it makes sure all the primary key and foreign keys are correct against the database constraints. But Cosmos, and most NoSQL databases has no constraints like SQL databases. This is another feature that NoSQL database drops to make the database simpler and therefore faster.

In the end the way that Cosmos DB works is fine, as you can use the Guid.NewGuid() method to get a unique primary key value. But be aware of this “silent write fail” issue if you think all you are missing some data. 

NOTE: There was a bug about silently failing to write to an unknown database, but that is fixed in EF Core 3.1.

Issue #2: Counting number of books in Cosmos DB is SLOW!

In my original application I had a pretty standard paging system, with selection of page number and page size. This relies on being able to count how many books there are in the filtered list. But with the changeover to Cosmos DB this didn’t work very well at all.

First the EF Core 3.0 Cosmos DB provider does not support aggregate operators, e.g. Average, Sum, Min, Max, Any, All, and importantly for me, Count. There is a way to implement Count, but its really slow. I thought this was a real limitation, until I realised something about Cosmos DB – you really don’t want to count entries if you can help it – let me explain.

On SQL database counting things is very fast (100,000 books takes about 20 ms. on my PC). But using a Cosmos DB counting means accessing the data, and you are a) charged for all the data you access and b) you have to pay for the level of throughput you want provision for. Both of these is measured in RU/second, with the starting point being 400 RU/second for a Cosmos DB database on Azure.

The Azure Cosmos DB Emulator tells me that to count 100,000 of my book class using the best Count approach (i.e. Cosmos DB SQL: SELECT VALUE Count(c) FROM c) takes 200 ms. (not rate limited) and uses 1,608 RU/second – that is a costly query to provision for!

NOTE: The current EF Core solution, noSqlDbContext.Books.Select(_ => 1).AsEnumerable().Count() is worse. It takes 1 second to read 100,000 books.

So, the learning here is some things in Cosmos DB are faster than SQL and some things are slower, and more costly, than SQL. And you have to handle that.

This made me change the NoSQL side of the EfCoreSqlAndCosmos application to not use normal paging but use a Prev/Next page approach (see picture).

You might notice that Amazon uses a limited next/previous and says things like “1-16 of over 50,000 results for…” rather than count the exact number of entries.

Issue #3: Complex queries can be a problem in Cosmos DB

In my “filter by year” option I need to find all the years where books were published, so that the user can pick the year they are looking for from a dropdown. When I ran my code taken from my SQL example I got an exception with a link to this EF Core Issue which says that Cosmos DB has some limitations on queries.

See the two code snippets below from my filter dropdown code to see the difference from the Cosmos DB (left) and SQL Server (right) code. Notice the Cosmos DB query needed the second part of the query to be done in the application.

The learning from this example is that Cosmos DB doesn’t support complex queries. In fact the general rule for NoSQL databases is that they don’t have the range of query capabilities that SQL database has.

Issue #4: You can’t sort on a mixture of nullable and non-nullable entries

My SQL queries to show books contains a LINQ command to work out the average review stars for a book. SQL databases returns null if there aren’t any reviews for a book (see note at the end of this second for the technicalities of why that is).

So, when I built my Cosmos DB version, I naturally added a nullable double (double?) to match. But when I tried to order by the reviews it threw an exception because I had a mix of nulls and numbers. Now, I don’t know if this is a bug in EF Core (I have added an issue to the EF Core code) or an overall limitation.

UPDATE: It turns out its a bug in Cosmos DB which hasn’t been fixed yet. I suspect that fix won’t get into EF Core 3.1 but hopefully fixed in EF Core 5.

The solution for my application was easy – I just set a null value to 0 as my ReviewsCount property told me if there were any reviews. But be aware of this limitation if your using nulls, especially strings. PS. Saving/returning nulls work find, and Where clauses work too – it’s just OrderBy that has a problem.

NOTE: In my SQL version I use the LINQ command … p.Reviews.Select(y => (double?)y.NumStars).Average(), which converts into the SQL command AVG(CAST([r1].[NumStars] AS float)). The (double?) cast is necessary because the SQL AVE command returns null if there are no rows to average over.

Issue #5. Missing database functions

In my SQL filter by year I have the following piece of code:

var filterYear = int.Parse(filterValue);      
return books.Where(x => x.PublishedOn.Year == filterYear 
     && x.PublishedOn <= DateTime.UtcNow);

That returns all the books in the give year, but excludes books not yet published. I ran into two issues with the Cosmos DB version.

Firstly, the PublishedOn.Year part of the query gets converted into (DATEPART(year, [b].[PublishedOn]) in SQL database, but Cosmos DB doesn’t have that capability. So, to make the filter by year feature to work I had to add an extra property called “YearPublished” to hold the year.

The second part was the DateTime.UtcNow gets converted to GETUTCDATE() in a SQL database. Now Cosmos DB does have a GetCurrentDateTime () method, but at the moment EF Core 3.0 does not support that, and many other Cosmos database functions too (subscribe to this EF Core issue to track progress on this).

All is not lost. By me adding a new int property called “YearPublished” to my CosmosBook and getting the UtcNow from the client I got the query to work – see below:

var now = DateTime.UtcNow;
var filterYear = int.Parse(filterValue);      
return books.Where(x => x.YearPublished == filterYear 
     && x.PublishedOn <= now);

This is another example of different query features between SQL databases (which are very similar because of the definition of the SQL language) and a range of different NoSQL databases, don’t have any standards to follow. On the plus side Cosmos DB automatically indexes every property by default, which helps my application.

Issue #6. No relationships (yet)

We now have a stable, usable Cosmos DB database provider in EF Core 3.0, but its certainly not finished. From the EF Core Azure Cosmos DB Provider Limitations page in the EF Core docs it said (lasted updated 09/12/2019)

Temporary limitations

  • Even if there is only one entity type without inheritance mapped to a container it still has a discriminator property.
  • Entity types with partition keys don’t work correctly in some scenarios
  • Include calls are not supported
  • Join calls are not supported

Looking at the last two items that means, for now, you will have to put all the data into the one class using [Owned] types (see the example I did right at the beginning). That’s OK for my example application, but does cut out a number of options for now. I will be interesting to see what changes are made to the Cosmos DB provider in EF Core 5 (due out in November 2020).

Issue #7. Permanent limitation due to Cosmos DB design

The Cosmos DB database has certain limitations over what you are used to with SQL database. The two big ones are:

1. No migrations – can cause problems!

Cosmos DB, like many NOSQL databases, saves a json string, which Cosmos DB calls a “Document” – I showed you that in the first example. Cosmos DB doesn’t control the content of that Document, it just reads or writes to it, so there’s no ‘migration’ feature that you are used change all the Documents in a database. And this can cause problems you need to be aware of!

For instance, say I updated my CosmosBook class to add a new property of type double called “Discount”. If you now tried to read the ‘old’, i.e. data that was written to the database before you added the “Discount” property, then you would get an exception. That’s because it expects a property called “Discount” to fill in the non-nullable property in your class.

As you can see, this means you need to think about migrations yourself: either run special code to add a default value for the new, non-nullable property or make your new properties nullable

NOTE: Taking a property out of a class is fine – the data is in the document, but it just isn’t read. When you update document, it replaced all old data with the new data.

2. No client-initiated transactions

Cosmos only provide Document-level transactions. That means it updates the whole document, or doesn’t change anything in the document. That way you can be sure the document is correct. But doing any clever transaction stuff that SQL database provide are not in Cosmos DB (and most, but not all, NoSQL databases).

This makes some things that you can do in SQL quite hard, but if you really need transactions then you should either need a NoSQL that has that feature (a few do) or go back to a SQL database.

However there are ways to implement a substitute for a transaction, but it’s quite complex. See the series called Life Beyond Distributed Transactions: An Apostate’s Implementation, by Jimmy Bogard for a way to work around this.

UPDATE: Cosmos DB is continually changing and in the the November 2019 “what’s new” article they have added some form of transactional feature to Cosmos DB (and support for GroupBy). So my comments about no transactions might change in the future.

Unit testing Cosmos DB code

I know this article is long but I just wanted to let you know about the very useful Azure Cosmos DB Emulator for testing and developing your Cosmos DB applications. The Emulator runs locally and store your results using localdb. It also come with a nice interface to see/edit the data.

NOTE: You could use a real Cosmos DB on Azure, but they aren’t cheap and you will need lots of databases if you want to run your unit tests in parallel.

I have added a couple of extensions to my EfCore.TestSupport library to help you set up the options for Cosmos DB accessing the Azure Cosmos DB Emulator. Here is a very simple example of a unit test using my GetCosmosDbToEmulatorOptions<T> method, with comments at the end

[Fact]
public async Task ExampleUsingCosmosDbEmulator()
{
    //SETUP
    var options = this.GetCosmosDbToEmulatorOptions<CosmosDbContext>();
    using var context = new CosmosDbContext(options);
    await context.Database.EnsureDeletedAsync();
    await context.Database.EnsureCreatedAsync();

    //ATTEMPT
    var cBook = new CosmosBook {CosmosBookId = 1, Title = "Book Title")
    context.Add(cBook);
    await context.SaveChangesAsync();

    //VERIFY
    (await context.Books.FindAsync(1)).ShoudlNotBeNull();
}

Comments on certain lines.

  • Line 5: This method in by library links to your local Azure Cosmos DB Emulator creates a database using the name of the test class. Having a name based on the test class means only the tests in this class uses that database. That’s important because xUnit can run test classes in parallel and you don’t want tests in other classes writing all over the database you are using for your test.
  • Line 7 & 8: I found deleting and the recreating a Cosmos DB is a quick (< 2 secs on my PC) way to get an empty database. That’s much better than SQL Server database that can take around 8 to 10 seconds on my PC.

NOTE: The GetCosmosDbToEmulatorOptions have a few other options/calls. You can find the full documentation about this here. Also, the EfCoreSqlAndCosmos repo has lots of unit tests you can look at to get the idea – see the Test project.

Conclusion

Wow, this article is very long and took me way longer than I expected. I really hope you find this useful. I certainly learnt a lot about what Cosmos DB can do and the state of the EF Core 3 Cosmos DB database provider.

This is the first non-preview version of the EF Core Cosmos DB database provider and many features aren’t available, but it had enough for me to use it successfully. Its going to be very interesting to see where both the EF Core Cosmos provider and the Cosmos DB will be when NET Core 5 comes out in November 2020.

As I said at the beginning of the article, I think both SQL and NosSQL databases have a role in building modern applications. Each approach has different strengths and weaknesses, and this article is about seeing what they are. But its not until I write the next article comparing the performance of various approaches will some of the strengths of the Cosmos DB will come through.

In the meantime, I hope this article and its companion EfCoreSqlAndCosmos repo helps you in understanding and using Cosmos DB.

Happy coding!

If you have a ASP.NET Core or Entity Framework Core problem that you want help on then I am available as a freelance contractor. Please send me a contact request via my Contact page and we can talk some more on Skype.