Using health checks to run async tasks in ASP.NET Core

In this post, I show an approach to running async tasks on app startup which I discounted in my first post in this series, but which Damian Hickey recently expressed a preference for. This approach runs the startup tasks using the IHostedService abstraction, with a health check to indicate when all startup tasks have completed. Additionally, a small piece of middleware ensures that non-health-check traffic returns a 503 response when the startup tasks have not yet completed.

ASP.NET Core health checks: a brief primer

The approach in this post uses the Health Check functionality that was introduced in ASP.NET Core 2.2. Health checks are a common feature of web apps that are used to indicate whether an app is running correctly. They're often used by orchestrators and process managers to check that an application is functioning correctly, and is able to serve traffic.

The Health Checks functionality in ASP.NET Core is highly extensible. You can register any number of health checks that test some aspect of the "health" of your app. You also add the HealthCheckMiddleware to your app's middleware pipeline. This responds to requests at a path of your choosing (e.g. /healthz) with either a 200 indicating the app is healthy, or a 503 indicating the app is unhealthy. You can customise all of the details, but that's the broad concept.

Damian argued for using health checks to indicate when an app's startup tasks have completed and the app is ready to start handling requests. He expressed a preference for having Kestrel running quickly and responding to health check requests, rather than allowing the health requests to timeout (which is the behaviour you get using either approach I discussed in part 2 of this series). I think that's a valid point, and served as the basis for the approach shown in this post.

There's no reason you have to use the built-in ASP.NET Core 2.2 health check functionality. The actual health check is very simple, and could easily be implemented as a small piece of middleware instead if you prefer.

Changing the requirements for async startup tasks

In my first post in this series, I described the need to run various tasks on app startup. My requirements were the following:

Tasks should be run asynchronously (i.e. using async/await), avoiding sync-over-async.
The DI container (and preferably the middleware pipeline) should be built before the tasks are run.
All tasks should be completed before Kestrel starts serving requests.

In addition, for the examples I provided, all tasks were run in series (as opposed to in parallel), waiting for one task to complete before starting the next one. That wasn't an explicit requirement, just one that simplified things somewhat.

Points 1 and 2 still hold for the approach shown in this post, but point 3 is explicitly dropped and exchanged for the following two points:

Kestrel is started and can start handling requests before the tasks are started, but it should respond to all non-health-check traffic with a 503 response.
Health checks should only return "Healthy" once all startup tasks have completed.

In the next section I'll give an overview of the various moving parts to meet these requirements.

An overview of running async startup tasks using health checks

There are four main components to the solution shown here:

A shared (singleton) context. This keeps track of how many tasks need to be executed, and how many tasks are still running.
One or more startup tasks. These are the tasks that we need to run before the app starts serving traffic.
- Derived from IHostedService using the standard ASP.NET Core background service functionality.
- Registered with the shared context when the app starts.
- Start running just after Kestrel is started (along with other IHostedService implementations)
- Once complete, marks the task as complete in the shared context
A "startup tasks" health check. An ASP.NET Core health check implementation that checks the shared context to see if the tasks have completed. Returns Unhealthy until all tasks have completed.
A "barrier" middleware. A small piece of middleware that sits just after the standard HealthCheckMiddleware. Blocks all requests by returning a 503 until the shared context indicates that all startup tasks have completed.

I'll walk through each of those components in the following sections to build up the complete solution.

Keeping track of completed tasks

The key component in this design is the StartupTaskContext. This is a shared/singleton object that is used to keep track of whether all of the startup tasks have finished.

In keeping with typical ASP.NET Core design concepts I haven't used a static class or methods, and instead rely on the DI container to create a singleton instance. This isn't necessary for the functionality; you could just use a shared object if you like. Either way, you need to ensure the methods are thread-safe, as they may be accessed concurrently if multiple startup tasks complete at the same time.

public class StartupTaskContext
{
    private int _outstandingTaskCount = 0;

    public void RegisterTask()
    {
        Interlocked.Increment(ref _outstandingTaskCount);
    }

    public void MarkTaskAsComplete()
    {
        Interlocked.Decrement(ref _outstandingTaskCount);
    }

    public bool IsComplete => _outstandingTaskCount == 0;
}

This is pretty much the most basic implementation that keeps a count of the number of tasks that haven't yet completed. Once _outstandingTaskCount reaches 0, all startup tasks are complete. There's obviously more that could be done here to make the implementation robust, but it will do for most cases.

As well as the shared context, we need some startup tasks to run. We mark services as startup tasks using a marker interface, IStartupTask, which inherits from IHostedService.

public interface IStartupTask : IHostedService { }

I used the built-in IHostedService interface as WebHost handles starting them automatically after Kestrel, and it allows you to use helper classes like BackgroundService which help with writing long-running tasks.

As well as implementing the marker interface, a startup task should call MarkTaskAsComplete() on the shared StartupTaskContext after completing its work.

For simplicity, the example service shown below just waits for 10 seconds before calling MarkTaskAsComplete() on the injected StartupTaskContext:

public class DelayStartupTask : BackgroundService, IStartupTask
{
    private readonly StartupTaskContext _startupTaskContext;
    public DelayStartupTask(StartupTaskContext startupTaskContext)
    {
        _startupTaskContext = startupTaskContext;
    }

    // run the task
    protected override async Task ExecuteAsync(CancellationToken stoppingToken)
    {
        await Task.Delay(10_000, stoppingToken);
        _startupTaskContext.MarkTaskAsComplete();
    }
}

As is common in ASP.NET Core, I created some helper extension methods for registering the shared context and startup tasks with the DI container:

AddStartupTasks() registers the shared StartupTaskContext with the DI container, ensuring it's not added more than once.
AddStartupTask<T>() is used to add a specific startup task, it
- Registers the task with the shared context by calling RegisterTask()
- Adds the shared context to the DI container if it isn't already
- Adds the task <T> as a hosted service (as IStartupTask implements IHostedService)

public static class StartupTaskExtensions
{
    private static readonly StartupTaskContext _sharedContext = new StartupTaskContext();
    public static IServiceCollection AddStartupTasks(this IServiceCollection services)
    {
        // Don't add StartupTaskContext if we've already added it
        if (services.Any(x => x.ServiceType == typeof(StartupTaskContext)))
        {
            return services;
        }

        return services.AddSingleton(_sharedContext);
    }

    public static IServiceCollection AddStartupTask<T>(this IServiceCollection services) 
        where T : class, IStartupTask
    {
        _sharedContext.RegisterTask();
        return services
            .AddStartupTasks() // in case AddStartupTasks() hasn't been called
            .AddHostedService<T>();
    }
}

Registering the example task DelayStartupTask in Startup.ConfigureServices() is a single method call:

public void ConfigureServices(IServiceCollection services)
{
    // ... Existing configuration
    services.AddStartupTask<DelayStartupTask>();
}

That's the mechanics of the startup task out of the way, now we can add the health check.

Implementing the StartupTasksHealthCheck

In many cases, if you need a custom health check you should checkout the BeatPulse library. It integrates directly with ASP.NET Core 2.2, and currently lists about 20 different checks, including system (disk, memory) and network checks, as well as integration checks like RabbitMQ, SqlServer, or Redis.

Luckily, if you do need to write your own custom health check, implementing IHealthCheck is straightforward. It has a single asynchronous method CheckHealthAsync, from which you return a HealthCheckResult instance with one of three values: Healthy, Degraded, or Unhealthy.

Our custom health check checks the value of StartupTaskContext.IsComplete and returns Healthy or Unhealthy as appropriate.

public class StartupTasksHealthCheck : IHealthCheck
{
    private readonly StartupTaskContext _context;
    public StartupTasksHealthCheck(StartupTaskContext context)
    {
        _context = context;
    }

    public Task<HealthCheckResult> CheckHealthAsync(
        HealthCheckContext context, 
        CancellationToken cancellationToken = new CancellationToken())
    {
        if (_context.IsComplete)
        {
            return Task.FromResult(HealthCheckResult.Healthy("All startup tasks complete"));
        }

        return Task.FromResult(HealthCheckResult.Unhealthy("Startup tasks not complete"));
    }
}

To register the health check in Startup.ConfigureServices(), call the AddHealthChecks() extension method, followed by AddCheck<>(). You can provide a meaningful name for the health check, "Startup tasks" in this case:

public void ConfigureServices(IServiceCollection services)
{
    // ... Existing configuration
    services.AddStartupTask<DelayStartupTask>();

    services
        .AddHealthChecks()
        .AddCheck<StartupTasksHealthCheck>("Startup tasks");
}

Finally, add the health check middleware to the start of your middleware pipeline in Startup.Configure(), defining the path to use for the health check ("/healthz")

public void Configure(IApplicationBuilder app, IHostingEnvironment env)
{
    app.UseHealthChecks("/healthz");
    
    // ... Other middleware config
}

That's all the pieces you need to achieve the main goal of this post. If you start the app and hit the /healthz endpoint, you'll get a 503 response and see the text Unhealthy. After 10 seconds, once the DelayStartupTask completes, you'll get a 200 response and the Healthy text:

Example of health check failing, and then succeeding

Note that adding this health check doesn't affect any other health checks you might have. The health check endpoint will return unhealthy until all of the health checks pass, including the StartupTasksHealthCheck. Also note that you could implement the above functionality using a small piece of middleware if you don't want to use the built-in ASP.NET Core health check functionality.

The one requirement we haven't satisfied yet is that non-health check traffic shouldn't be able to invoke our regular middleware pipeline / MVC actions until after all the startup tasks have completed. Ideally that wouldn't be possible anyway, as a load balancer should wait for the healthy check to return 200 before routing traffic to your app. But better safe than sorry!

Middleware

The middleware we need is very simple: it needs to return an error response code if the startup tasks have not finished. If the tasks are complete, it does nothing and lets the rest of the middleware pipeline complete.

The following custom middleware does the job, and adds a "Retry-After" header to the response if the StartupTaskContext indicates the tasks aren't complete. You could extract things like the Retry-After value, the plain-text response ("Service Unavailable"), or even the response code to a configuration object, but I kept it simple for this post:

public class StartupTasksMiddleware
{
    private readonly StartupTaskContext _context;
    private readonly RequestDelegate _next;

    public StartupTasksMiddleware(StartupTaskContext context, RequestDelegate next)
    {
        _context = context;
        _next = next;
    }

    public async Task Invoke(HttpContext httpContext)
    {
        if (_context.IsComplete)
        {
            await _next(httpContext);
        }
        else
        {
            var response = httpContext.Response;
            response.StatusCode = 503;
            response.Headers["Retry-After"] = "30";
            await response.WriteAsync("Service Unavailable");
        }
    }
}

Register the middleware just after the health check middleware. Any traffic that makes it past the health check will be stopped if the startup tasks have not completed.

public void Configure(IApplicationBuilder app, IHostingEnvironment env)
{
    app.UseHealthChecks("/healthz");
    app.UseMiddleware<StartupTasksMiddleware>();
    
    // ... Other middleware config
}

Now if you run your app and hit a non-health-check endpoint you'll get a 503 initially. Normal functionality is restored once your startup tasks have completed.

An example of the home page of the app returning a 503 until the startup tasks have completed

Summary

The approach in this post was suggested to me by Damian Hickey. It uses the health check functionality from ASP.NET Core to indicate to load balancers and orchestrators whether or not an app is ready to run. This ensures Kestrel starts as soon as possible and consequently should reduce the number of network timeouts in load balancers compared to the approaches I described in part 2 of this series. You can find a full example of the code in this post on GitHub.

Andrew Lock | .NET Escapades Andrew Lock