How to take an ASP.NET Core web site “Down for maintenance”

Last Updated: October 20, 2022 | Created: September 20, 2022

If you have an e-commerce or business web app used by lots of users, then you really don’t want that app to be “down” (e.g. “site offline” or “site not found”) because it’s bad for business. But at the same time some database changes are just too complex to allow users to access a database while the data being changed. This article describes a way to momentary divert users during a database is changed, which means that the database change code has exclusive access, and any change has the smallest effect on your logged-in users.

I designed this approach for multi-tenant applications, especially when using sharding. In these sorts of applications a single tenant might need changing or moved and the code to do that needs exclusive access – see this Microsoft article which describes the split and merge processes, which are two examples of changes that need exclusive access.

This article is part of the series that covers .NET multi-tenant applications in general. The other articles in “Building ASP.NET Core and EF Core multi-tenant apps” series are:

  1. The database: Using a DataKey to only show data for users in their tenant
  2. Administration: different ways to add and control tenants and users
  3. Versioning your app: Creating different versions to maximise your profits
  4. Hierarchical multi-tenant: Handling tenants that have sub-tenants
  5. Advanced techniques around ASP.NET Core Users and their claims
  6. Using sharding to build multi-tenant apps using EF Core and ASP.NET Core
  7. Three ways to securely add new users to an application using the AuthP library
  8. How to take an ASP.NET Core web site “Down for maintenance” (This article)
  9. Three ways to refresh the claims of a logged-in user

TL;DR; – Summary of this article

  • The feature described is solves a problem that can arise in multi-tenant applications, that is it can temporarily stop users from accessing a tenant’s data while a complex change is applied to the tenant data. A “complex change” might be moving a tenant’s data to another database.
  • The solution uses ASP.NET Core’s middleware to intercept every HTTP request and checks that that data that the user might access isn’t “down”, i.e. that data is being changed and mustn’t accessed. If the data the user uses is “down” they are diverted to a “down for maintenance – back soon” page.
  • Because the middleware is called on every HTTP request, I have used the FileStore distributed cache, which has a read time of ~25 ns, which means this feature doesn’t slow down the application.
  • I have implemented the code in version 3.4.0 of my open-source AuthPermissions.AspNetCore library – see the “Down for Maintenance” documentation. But the design and code of this feature can be copied to any ASP.NET Core application.

Setting the scene – why did I need this feature

I have a library called my AuthPermissions.AspNetCore library (referred to as AuthP in this article) that helps developers to build complex multi-tenant applications and it includes sharding, that is each tenant has their own database. One of the best ways to manage lots of databases is Azure SQL Server elastic pools but the suggested elastic pool support library is not supported any more. So, if I wanted to use SQL Server elastic pools, then I needed to build code that implements the split-merge code.

I had built most of the features needed, like defining a tenant and the keys for each tenant and sharding, in version 3.0.0 of the AuthP library, but the last missing feature is the ability to stop users from accessing a tenant while it is changed / moved (I used the term move for both split and merge). That’s because if a user is accessing the tenant data at the same time as a, then the user might get the wrong data or more crucially, it can cause data loss during a move.

The diagram below shows the process I need to build if I want to successfully change / move a tenant’s data while the application is still running.  Note that only tenant user linked to “tenant 123” are diverted while users not linked to “tenant 123” would work normally.

NOTE: In the AuthP library the keys to a tenant data key(s) are held in the user’s claims, which means that after a change / move the user’s tenant claims(s) need updating. The AuthP library has a feature called “update claims on tenant change” – click the link to go to the documentation.

It turns out that the solution to implement this “down” process is to use ASP.NET Core’s Middleware. You can intercept a user and divert them to another page / url if a move / change is in action by adding an extra middleware in the correct place. I call a divert a “down” because the tenant is “down for maintenance” while the change / move is being executes.

The downside of the added the extra middleware is that the code is called on every HTTP request. This means the middleware needs to be is fast, otherwise you will slow down your whole application for a few, infrequent change / move diverts. I solved this by creating the FileStore distritributed cache, which has a very fast read time (e.g. ~25 ns).

Read on to see how this works and how you could use it.

Design aims: what database changes do I want to cover?

The main “down” feature is temporarily diverting users accessing a tenant database while a change / move is being applied, but I also found some other added some extra diverts as well, which are listed below:

  • Manual, application “down”: Allows an admin user to manually “down” the whole application. Every user apart from the admin who took the app “down” will be diverted to a page with an explanation and expected time when the app will be available.
  • Manual, tenant database “down”: Allows an admin user to manually “down” a tenant database, thus diverting all users linked to the tenant database to a page saying, “stopped by admin”.  Access to the tenant can be restored by an admin manually removing this this “down”.
  • Tenant database Delete: This permanently diverts all users linked to the deleted tenant to a page saying, “the tenant is deleted”. This is a permanent divert, but it can be removed manually. 

Here is a diagram that shows how the“ down for maintenance” feature can be implemented in ASP.Net Core.

The rest of the article describes each step in “down for maintenance” feature, with references to the code in my AuthP library. The steps are:

  1. Startup: Registering the services
  2. Adding a StatusController (or an equivalent Web API)
  3. Using the ISetRemoveStatus service to set / remove a “down” state
  4. Understanding the “down for maintenance” middleware
  5. Other things to consider when moving a tenant database

1. Startup: Registering the services

There are two parts to setup the register the “down for maintenance” feature:

  • Registering the “down for maintenance” services
  • Adding the “down for maintenance” middleware.

Both parts are applied in the ASP.NET Core Program / Startup code. First is the registering of the FileStore cache, which holds the various “down” statuses, and the SetRemoveStatus class, which provide simple methods to add / remove “down” statuses. The code below is added in the startup section that registers services with the .NET dependency injection provider.

//previous code left out
builder.Services.AddDistributedFileStoreCache(options =>
{
    options.WhichVersion = FileStoreCacheVersions.Class;
}, builder.Environment);

builder.Services.AddTransient
     <ISetRemoveStatus, SetRemoveStatus>(); 

The “down for maintenance” middleware is added in the “app” part of the ASP.NET Core startup code – see the highlighted line that adds the extra middleware.

var app = builder.Build();
//other app code left out

app.UseAuthentication();
app.UseAuthorization();
app.UseDownForMaintenance();

//other code left out

The important thing is that the “down for maintenance” middleware is added AFTER the UseAuthorization method. That’s because the “down for maintenance” middleware needs assess to the user’s claims.

2. Create a Controller / web APIs to handle the “down for maintenance”

You need pages / APIs to handle the following:

  • For the admin users
    • Look at all the current “downs” and have the ability to remove any
    • Manually set the app “down” (with messages for the users)
    • Manually set a tenant “down”
  • For diverted users
    • App Down
    • Tenant down while being updated
    • Tenant down by admin
    • Tenant is deleted

In the Example4 web site (hierarchical tenant design) and Example6 web site (single-level + sharding) I have a controller called StatusController that contains the actions / pages listed above. Please look at the Example4’s StatusController for an example of what you need to create.

NOTE: the diverted pages are hard coded into the RedirectUsersViaStatusData class, while the controller’s name can be changed. If you want to have different urls for the diverted pages, then you need to copy the code and register your version of the RedirectUsersViaStatusData class.

3. Using the ISetRemoveStatus service to set / remove a “down” state

The SetRemoveStatus class contains the code to set, remove and display the “down” statues in the FileStore distributed cache. There are many types of diverts and this service creates the cache key which defines the type of divert that the user should be diverted to.

The AppDown divert is easy because it has one divert, but the tenant divert is more complex because a) it has three divert types and b) a divert is unique to a tenant. Each “down” entry in FileStore distributed database has a unique key name, which allows you to have multiple “downs” at once. And in the case of a tenant down the FileStore entry’s value is the tenant key, which is used to detect if the user is linked to a tenant that is in a “down” state.

The ISetRemoveStatus service makes it easy for the developer to wrap your change / move code with a “down” at the start and remove the “down”” at the end. The code below shows an example of how the ISetRemoveStatus service would work, with the “down” and remove “down” code highlighted.

[HttpPost]
[ValidateAntiForgeryToken]
[HasPermission(Example6Permissions.MoveTenantDatabase)]
public async Task<IActionResult> MoveDatabase(
    ShardingSingleLevelTenantDto input)
{
    var removeDownAsync = await _upDownService
        .SetTenantDownWithDelayAsync(
              TenantDownVersions.Update, input.TenantId);
    var status = await _authTenantAdmin
        .MoveToDifferentDatabaseAsync(input.TenantId, 
              input.HasOwnDb, input.ConnectionName);
    await removeDownAsync();

    return status.HasErrors
        ? RedirectToAction(nameof(ErrorDisplay),
              new { errorMessage = status.GetAllErrors() })
        : RedirectToAction(nameof(Index), 
              new { message = status.Message });
}

As you can see you define what type of tenant change via the TenantDownVersions enums. The ISetRemoveStatus service handles creating the key name for the actual “down” entry in the FileStore distributed database. The “down” entry key string is designed to make finding / filtering the “down” values to work quickly, so the key string is a bit complex. The figure below shows the various combinations of key names to provide a) define what type of divert it is, and b) is unique name for each tenant.

NOTE: For a tenant “down” entry the value is the tenant’s unique key, while for the AppDown the value contains a message, expected time, and UserId of the user that “downed” the whole app.

4. Understanding the “down for maintenance” middleware

The middleware code (see RedirectUsersViaStatusData class) is called on every HTTP request, and its job is to quickly let through a user if there isn’t an “down” status that effects the current user. There are three stages in this middleware to cover each part of the filter. They are:

NOTE: I use the term admin user (see this link) to define a user who is managing the application. These types of users have a) access to high-level admin features and b) aren’t linked to a tenant.

STAGE 1: Allowed URLs get through

The middleware allows two types of URLs.

  • You can login and logout. I added this when I “downed” the app and then rerun the app, at which point I couldn’t log to remove the “App down”!
  • I allow access to the Status controller. This allows an admin user and manually turn off a “down” if anything goes wrong.

STAGE 2: Handle AppDown

The AppDown feature stops all users from using the application’s features, apart from the admin user that “downed” the app. This means that the admin user can check / fix the problem before removing the “down” on the app.

This feature is there for situations where the application’s software or data that can’t be updated by the normal deploy / migrate approach. You will rarely need the AppDown feature, but it’s there for emergencies.

STAGE 3: Handle Tenant “down”

The main usage of the middleware is to managing changes to a tenant’s data and the code uses the start of the “down” key to detect which types of divert is needed. The three types are:

  • Tenant down while being updated
  • Tenant down by an admin use (known as tenant “manual down”)
  • Tenant is deleted (this stops user trying a tenant that doesn’t exist)

NOTE: An example of the code to take a tenant “down” while being updated can be found in section 3.

The middleware code isn’t complex, but it’s a bit hard to follow so I have provided a flowchart to show how the three stages are handled. The important thing is the middleware is very fast (via using the FileStore distributed cache) at letting though users when no “down” is active.

NOTE: The RedirectUsersViaStatusData class has comments starting with the three STAGES shown in the flowchart.

5. Other things to consider when moving a tenant database

The tenant “Down for Maintenance” feature solves the most complex issue of ensuring that the tenant data isn’t accessed during the data is moved. But there are some extra issues you need to consider which the AuthP library already has solutions for. The issues are:

  1. Updating the tenant user’s DataKey claims on a move
  2. An internal hierarchical move needs to “down” two parts of the tenant data
  3. The admin access to tenant data feature needs extra code in the middleware

5.1. Updating the tenant user’s DataKey claims on a move

If you are moving a database in a sharding multi-tenant application or moving data in a hierarchical multi-tenant application, then the information used by the user to access the tenant data will change. Therefore, you MUST update the information used by the user to access the tenant data.

In the AuthP library the user’s key to a tenant data is held in the user’s claims which makes the user access very fast (see this section of an earlier article). But that means that the tenant claims need to be updated when the DataKey changes, and AuthP has feature that detects a change to the tenant DataKey parts and then makes sure all the logged-in users have their claims updated – see the AuthP “update claims on tenant change” documentation on how this works.

5.2. An internal hierarchical move needs to “down” two parts of the tenant data

The AuthP hierarchical multi-tenant has a move feature where a section of the hierarchical data can be moved to another part of the hierarchy – known as the parent (see this example). In this case you need to “down” both the section to be moved and the section that the moved too.

For this reason, the SetTenantDownWithDelayAsync method has an optional parameter called parentId. If the parentId is not zero, then it will also “down” the parent during the hierarchical move. The code below shows the code, with the extra parentId parameter highlighted.

var removeDownAsync = await _upDownService
    .SetTenantDownWithDelayAsync(
        TenantDownVersions.Update, input.TenantId, 
        input.ParentId);
var status = await _authTenantAdmin
    .MoveHierarchicalTenantToAnotherParentAsync
        (input.TenantId, input.ParentId);
await removeDownAsync();

5.3. The “admin access to tenant” data feature needs extra code in the middleware

The AuthP library provides a feature that allows admin / support users (i.e. users not linked to a tenant) to temporary gain access to a tenant’s data (see the admin access to tenant documentation for more information).

This is implemented by using a cookie to contain the tenant DataKey, but the “down for maintenance” middleware doesn’t contain code to handle that. While giving admin user a way to access the tenant’s data is useful if a problem occurs in the change / move, but admin must be aware of any tenant change / move and not try to access that tenant (or turn off the “admin access to tenant” feature).

Conclusions

Back in 2015 I wrote an article about how to take an ASP.NET MVC5 web site “Down for maintenance” and now in 2022 I this article provides a version for an ASP.NET Core application. The basic approach of using middleware is the same, but this latest approach also contains features to handle multi-tenant applications.

Both the older ASP.NET MVC5 version and the latest ASP.NET Core are designed to be quick. This focus on high performance is because the code is run on every HTTP request. Both versions use a shared file to work across multiple instances of the web applications, for instance when you use Azure’s scale-out. But the new version has much more complex needs, with tenant-level “down” features, which required a more sophisticated approach, which is handled by the FileStore distributed cache acting as a fast-read / slow-write database.

With this feature added to version 3.4.0 of the AuthP library you can safely manage tenants while users are accessing your multi-tenant application.

0 0 votes
Article Rating
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments