Reducing initial request latency by pre-building services in a startup task in ASP.NET Core

This post follows on somewhat from my recent posts on running async startup tasks in ASP.NET Core. Rather than discuss a general approach to running startup tasks, this post discusses an example of a startup task that was suggested by Ruben Bartelink. It describes an interesting way to try to reduce the latencies seen by apps when they've just started, by pre-building all the singletons registered with the DI container.

The latency hit on first request

The ASP.NET Core framework is really fast, there's no doubt about that. Throughout its development there's been a huge focus on performance, even driving the development of new high-performance .NET types like Span<T> and System.IO.Pipelines.

However, you can't just have framework code in your applications. Inevitably, developers have to put some actual functionality in their apps, and if performance isn't a primary focus, things can start to slow down. As the app gets bigger, more and more services are registered with the DI container, you pull in data from multiple locations, and you add extra features where they're needed.

The first request after an app starts up is particularly susceptible to slowing down. There's lots of work that has to be done before a response can be sent. However this work often only has to be done once; subsequent requests have much less work to do, so they complete faster.

I decided to do a quick test of a very simple app, to see the difference between that first request and subsequent requests. I created the default ASP.NET Core web template with individual authentication using the .NET Core 2.2 SDK:

dotnet new webapp --auth Individual --name test

For simplicity, I tweaked the logging in appsettings.json to write request durations to the console in the Production environment:

{
  "Logging": {
    "LogLevel": {
      "Default": "Warning",
      "Microsoft.AspNetCore.Hosting.Internal.WebHost": "Information"
    }
  }
}

I then built the app in Release mode, and published it to a local folder. I navigated to the output folder and ran the app:

> dotnet publish -c Release -o ..\..\dist
> cd ..\..\dist
> dotnet test.dll

Hosting environment: Production
Now listening on: http://localhost:5000
Now listening on: https://localhost:5001
Application started. Press Ctrl+C to shut down.

Next I hit the home page of the app https://localhost:5001 and recorded the duration for the first request logged to the console. I hit Ctrl+C to close the app, started it again, and recorded another duration for the "first request".

Obviously this isn't very scientific, It's not a proper benchmark, but I just wanted a feel for it. For those interested, I'm using a Dell XPS 15" 9560, w block has an i7-7700 and 32GB RAM.

I ran the "first request" test 20 times, and got the mean results shown below. I also recorded the times for the second and third requests

	Mean duration ± Standard Deviation
1^st request	315ms ± 12ms
2^nd request	4.3ms ± 0.6ms
3^rd request	1.4ms ± 0.3ms

After the 3^rd request, all subsequent requests took a similar amount of time.

As you can see, there's a big difference between the first request and the second request. I didn't dive too much into where all this comes from, but some quick tests show that the vast majority of the initial hit is due to rendering Razor. As a quick test, I added a simple API controller to the app:

public class ValuesController : Controller
{
    [HttpGet("/warmup")]
    public string Index() => "OK";
}

Hitting this controller for the first request instead of the default Razor Index page drops the first request time to ~90ms. Removing the MVC middleware entirely (and responding with a 404) drops it to ~45ms.

Pre-creating singleton services before the first request

So where is all this latency coming from for the first request? And is there a way we can reduce it so the first user to hit the site after a deploy isn't penalised as heavily?

To be honest, I didn't dive in too far. For my experiments, I wanted to test one potential mitigation proposed by Ruben Bartelink: instantiating all the singletons registered with the DI container before the first request.

Services registered as singletons are only created once in the lifetime of the app. If they're used by the ASP.NET Core framework to handle a request, then they'll need to be created during the first request. If we create all the possible singletons before the first request then that should reduce the duration of the first request.

To test this theory, I created a startup task that would instantiate most of the singletons registered with the DI container before the app starts handling requests properly. The example below uses the "IServer decorator" approach I described in part 2 of my series on async startup tasks, but that's not important; you could also use the RunWithTasksAsync approach, or the health checks approach I described in part 4.

The WarmupServicesStartupTask is shown below. I'll discuss the code shortly.

public class WarmupServicesStartupTask : IStartupTask
{
    private readonly IServiceCollection _services;
    private readonly IServiceProvider _provider;
    public WarmupServicesStartupTask(IServiceCollection services, IServiceProvider provider)
    {
        _services = services;
        _provider = provider;
    }

    public Task ExecuteAsync(CancellationToken cancellationToken)
    {
        foreach (var singleton in GetSingletons(_services))
        {
            // may be registered more than once, so get all at once
            _provider.GetServices(singleton);
        }

        return Task.CompletedTask;
    }

    static IEnumerable<Type> GetSingletons(IServiceCollection services)
    {
        return services
            .Where(descriptor => descriptor.Lifetime == ServiceLifetime.Singleton)
            .Where(descriptor => descriptor.ImplementationType != typeof(WarmupServicesStartupTask))
            .Where(descriptor => descriptor.ServiceType.ContainsGenericParameters == false)
            .Select(descriptor => descriptor.ServiceType)
            .Distinct();
    }
}

The WarmupServicesStartupTask class implements IStartupTask (from part 2 of my series) which requires that you implement ExecuteAsync(). This fetches all of the singleton registrations out of the injected IServiceCollection, and tries to instantiate them with the IServiceProvider. Note that I call GetServices() (plural) rather than GetService() as each service could have more than one implementation. Once all services have been created, the task is complete.

The IServiceCollection is where you register you register your implementations and factory functions inside Starrup.ConfigureServices. The IServiceProvider is created from the service descriptors in IServiceCollection, and is responsible for actually instantiating services when they're required.

The GetSingletons() method is what identifies the services we're going to instantiate. It loops through all the ServiceDescriptors in the collection, and filters to only singletons. We also exclude the WarmupServicesStartupTask itself to avoid any potential weird recursion. Next we filter out any services that are open generics (like ILogger<T>) - trying to instantiate those would be complicated by having to take into account type constraints, so I chose to just ignore them. Finally, we select the type of the service, and get rid of any duplicates.

By default, the IServiceCollection itself isn't added to the DI container, so we have to add that registration at the same time as registering our WarmupServicesStartupTask:

public void ConfigureServices(IServiceCollection services)
{
    //Other registrations
    services
        .AddStartupTask<WarmupServicesStartupTask>()
        .TryAddSingleton(services);
}

And that's all there is to it. I repeated the test again with the WarmupServicesStartupTask, and compared the results to the previous attempt:

	Mean duration ± Standard Deviation
1^st request, no warmup	315ms ± 12ms
1^st request, with warmup	289ms ± 11ms

I know, right! Almost knocked you off your chair. We shaved 26ms off the first request time.

I have to admit, I was a bit underwhelmed. I didn't expect an enormous difference, but still, it was a tad disappointing. On the positive side, it is close to a 10% reduction of the first request duration and required very little effort, so its not all bad.

Just to make myself feel better about it, I did an unpaired t-test between the two apps and found that there was a statistically significant difference between the two samples.

	Value
t	7.1287
degrees of freedom	38
standard error of difference	3.589
p	<0.0001

Still, I wondered if we could do better.

Creating all services before the first request

Creating singleton service makes a lot of sense as a way to reduce first request latency. Assuming the services will be required at some point in the lifetime of the app, we may as well take the hit instantiating them before the app starts, instead of in the context of a request. This only gave a marginal improvement for the default template, but larger apps may well see a much bigger improvement.

Instead of just creating the singletons, I wondered if we could just create all of the services our app uses in the startup task; not only the singletons, but the scoped and transient services.

On the face of it, it seems like this shouldn't give any real improvement. Scoped services are created new for each request, and are thrown away at the end (when the scope ends). And transient services are created new every time. But there's always the possibility that creating a scoped service could require additional bootstrapping code that isn't required by singleton services, so I gave it a try.

I updated the WarmupServicesStartupTask to the following:

public class WarmupServicesStartupTask : IStartupTask
{
    private readonly IServiceCollection _services;
    private readonly IServiceProvider _provider;
    public WarmupServicesStartupTask(IServiceCollection services, IServiceProvider provider)
    {
        _services = services;
        _provider = provider;
    }

    public Task ExecuteAsync(CancellationToken cancellationToken)
    {
        using (var scope = _provider.CreateScope())
        {
            foreach (var singleton in GetServices(_services))
            {
                scope.ServiceProvider.GetServices(singleton);
            }
        }

        return Task.CompletedTask;
    }

    static IEnumerable<Type> GetServices(IServiceCollection services)
    {
        return services
            .Where(descriptor => descriptor.ImplementationType != typeof(WarmupServicesStartupTask))
            .Where(descriptor => descriptor.ServiceType.ContainsGenericParameters == false)
            .Select(descriptor => descriptor.ServiceType)
            .Distinct();
    }
}

This implementation makes two changes:

GetSingletons() is renamed to GetServices(), and no long filters the services to singletons only.
ExecuteAsync() creates a new IServiceScope before requesting the services, so that the scoped services are properly disposed at the end of the task.

I ran the test again, and got some slightly surprising results. The table below shows the first request time without using the startup task (top), when using the startup task to only create singletons (middle), and using the startup task to create all the services (bottom)

	Mean duration ± Standard Deviation
1^st request, no warmup	315ms ± 12ms
1^st request, singleton warmup	289ms ± 11ms
1^st request, all services warmup	198ms ± 8ms

Graph of the above results

That's a mean reduction in first request duration of 117ms, or 37%. No need for the t-test to prove significance here! I can only assume that instantiating some of the scoped/transient services triggers some lazy initialization which then doesn't have to be performed when a real request is received. There's possibly JIT times coming in to play too.

Even with the startup task, there's still a big difference between the first request duration, and the second and third requests which are only 4ms and 1ms respectively. It seems very like there's more that could be done here to trigger all the necessary MVC components to initialize themselves, but I couldn't see an obvious way, short of sending a real request to the app.

It's worth remembering that the startup task approach shown here shouldn't only improve the duration of the very first request. As different parts of your app are hit for the firat time, most initialisation should already have happened, hopefully smoothing out the spikes in request duration for your app. But your mileage may vary!

Summary

In this post I showed how to create a startup task that loads all the singletons registered with the DI container on app startup, before the first request is received. I showed that loading all services in particular, not just singletons, gave a large reduction in the duration of the first request. Whether this task will be useful in practice will likely depend on your application, but it's simple to create and add, so it might be worth trying out! Thanks again to Ruben Bartelink for suggesting it.

Andrew Lock | .NET Escapades Andrew Lock