I was browsing the ASP.NET Core MVC GitHub repo the other day, checking out the new 1.1.0 Preview 1 code, when I spotted a usage of ConcurrentDictionary that I thought was interesting. This post explores the GetOrAdd function, the level of thread safety it provides, and ways to add additional threading constraints.

I was looking at the code that enables using middleware as MVC filters where they are building up a filter pipeline. This needs to be thread-safe, so they sensibly use a ConcurrentDictionary<>, but instead of a dictionary of RequestDelegate, they are using a dictionary of Lazy<RequestDelegate>. Along with the initialisation is this comment:

// 'GetOrAdd' call on the dictionary is not thread safe and we might end up creating the pipeline more
// once. To prevent this Lazy<> is used. In the worst case multiple Lazy<> objects are created for multiple
// threads but only one of the objects succeeds in creating a pipeline.
private readonly ConcurrentDictionary<Type, Lazy<RequestDelegate>> _pipelinesCache  
    = new ConcurrentDictionary<Type, Lazy<RequestDelegate>>();

This post will explore the pattern they are using and why you might want to use it in your code.

tl;dr; To make a ConcurrentDictionary only call a delegate once when using GetOrAdd, store your values as Lazy<T>, and use by calling GetOrAdd(key, valueFactory).Value.

The GetOrAdd function

The ConcurrentDictionary is a dictionary that allows you to add, fetch and remove items in a thread-safe way. If you're going to be accessing a dictionary from multiple threads, then it should be your go-to class.

The vast majority of methods it exposes are thread safe, with the notable exception of one of the GetOrAdd overloads:

TValue GetOrAdd(TKey key, Func<TKey, TValue> valueFactory);  

This overload takes a key value, and checks whether the key already exists in the database. If the key already exists, then the associated value is returned; if the key does not exist, the provided delegate is run, the value is stored in the dictionary, and then returned to the caller.

For example, consider the following little program.

public static void Main(string[] args)  
{
    var dictionary = new ConcurrentDictionary<string, string>();

    var value = dictionary.GetOrAdd("key", x => "The first value");
    Console.WriteLine(value);

    value = dictionary.GetOrAdd("key", x => "The second value");
    Console.WriteLine(value);
}

The first time GetOrAdd is called, the dictionary is empty, so the value factory runs and returns the string "The first value", storing it against the key. On the second call, GetOrAdd finds the saved value and uses that instead of calling the factory. The output gives:

The first value  
The first value  

GetOrAdd and thread safety.

Internally, the ConcurrentDictionary uses locking to make it thread safe for most methods, but GetOrAdd does not lock while valueFactory is running. This is done to prevent unknown code from blocking all the threads, but it means that valueFactory might run more than once if it is called simultaneously from multiple threads. Thread safety kicks in when saving the returned value to the dictionary and when returning the generated value back to the caller however, so you will always get the same value back from each call.

For example, consider the program below, which uses tasks to run threads simultaneously. It works very similarly to before, but runs the GetOrAdd function on two separate threads. It also increments a counter every time the valueFactory is run.

public class Program  
{
    private static int _runCount = 0;
    private static readonly ConcurrentDictionary<string, string> _dictionary
        = new ConcurrentDictionary<string, string>();

    public static void Main(string[] args)
    {
        var task1 = Task.Run(() => PrintValue("The first value"));
        var task2 = Task.Run(() => PrintValue("The second value"));
        Task.WaitAll(task1, task2);

        PrintValue("The third value")

        Console.WriteLine($"Run count: {_runCount}");
    }

    public static void PrintValue(string valueToPrint)
    {
        var valueFound = _dictionary.GetOrAdd("key",
                    x =>
                    {
                        Interlocked.Increment(ref _runCount);
                        Thread.Sleep(100);
                        return valueToPrint;
                    });
        Console.WriteLine(valueFound);
    }
}

The PrintValue function again calls GetOrAdd on the ConcurrentDictionary, passing in a Func<> that increments the counter and returns a string. Running this program produces one of two outputs, depending on the order the threads are scheduled; either

The first value  
The first value  
The first value  
Run count: 2  

or

The second value  
The second value  
The second value  
Run count: 2  

As you can see, you will always get the same value when calling GetOrAdd, depending on which thread returns first. However the delegate is being run on both asynchronous calls, as shown by _runCount=2, as the value had not been stored from the first call before the second call runs. Stepping through, the interactions could look something like this:

  1. Thread A calls GetOrAdd on the dictionary for the key "key" but does not find it, so starts to invoke the valueFactory.

  2. Thread B also calls GetOrAdd on the dictionary for the key "key". Thread A has not yet completed, so no existing value is found, and Thread B also starts to invoke the valueFactory.

  3. Thread A completes its invocation, and returns the value "The first value" back to the concurrent dictionary. The dictionary checks there is still no value for "key", and inserts the new KeyValuePair. Finally, it returns "The first value" to the caller.

  4. Thread B completes its invocation and returns the value "The second value" back to the concurrent dictionary. The dictionary sees the value for "key" stored by Thread A, so it discards the value it created and uses that one instead, returning the value back to the caller.

  5. Thread C calls GetOrAdd and finds the value already exists for "key", so returns the value, without having to invoke valueFactory

In this case, running the delegate more than once has no adverse effects - all we care about is that the same value is returned from each call to GetOrAdd. But what if the delegate has side effects such that we need to ensure it is only run once?

Ensuring the delegate only runs once with Lazy

As we've seen, there are no guarantees made by ConcurrentDictionary about the number of times the Func<> will be called. When building a middleware pipeline, however, we need to be sure that the middleware is only built once, as it could be doing some bootstrapping that is expensive or not thread safe. The solution that the ASP.NET Core team used is to use Lazy<> initialisation.

The output we are aiming for is

The first value  
The first value  
The first value  
Run count: 1  

or similarly for "The second value" - it doesn't matter which wins out, the important points are that the same value is returned every time, and that _runCount is always 1.

Looking back at our previous example, instead of using a ConcurrentDictionary<string, string>, we create a ConcurrentDictionary<string, Lazy<string>>, and we update the PrintValue() method to create a lazy object instead:

public static void PrintValueLazy(string valueToPrint)  
{
    var valueFound = _lazyDictionary.GetOrAdd("key",
                x => new Lazy<string>(
                    () =>
                        {
                            Interlocked.Increment(ref _runCount);
                            Thread.Sleep(100);
                            return valueToPrint;
                        }));
    Console.WriteLine(valueFound.Value);
}

There are only two changes we have made here. We have updated the GetOrAdd call to return a Lazy<string> rather than a string directly, and we are calling valueFound.Value to get the actual string value to write to the console. To see why this solves the problem, lets step through the example to see an example of what happens when we run the whole program.

  1. Thread A calls GetOrAdd on the dictionary for the key "key" but does not find it, so starts to invoke the valueFactory.

  2. Thread B also calls GetOrAdd on the dictionary for the key "key". Thread A has not yet completed, so no existing value is found, and Thread B also starts to invoke the valueFactory.

  3. Thread A completes its invocation, returning an uninitialised Lazy<string> object. The delegate inside the Lazy<string> has not been run at this point, we've just created the Lazy<string> container. The dictionary checks there is still no value for "key", and so inserts the Lazy<string> against it, and finally, returns the Lazy<string> back to the caller.

  4. Thread B completes its invocation, similarly returning an uninitialised Lazy<string> object. As before, the dictionary sees the Lazy<string> object for "key" stored by Thread A, so it discards the Lazy<string> it just created and uses that one instead, returning it back to the caller.

  5. Thread A calls Lazy<string>.Value. This invokes the provided delegate in a thread safe way, such that if it is called simultaneously by two threads, it will only run the delegate once.

  6. Thread B calls Lazy<string>.Value. This is the same Lazy<string> object that Thread A just initialised (remember the dictionary ensures you always get the same value back.) If Thread A is still running the initialisation delegate, then Thread B just blocks until it finishes and it can access the result. We just get the final return string, without invoking the delegate for a second time. This is what gives us the run-once behaviour we need.

  7. Thread C calls GetOrAdd and finds the Lazy<string> object already exists for "key", so returns the value, without having to invoke valueFactory. The Lazy<string> has already been initialised, so the resulting value is returned directly.

We still get the same behaviour from the ConcurrentDictionary in that we might run the valueFactory more than once, but now we are just calling new Lazy<>() inside the factory. In the worst case, we create multiple Lazy<> objects, which get discarded by the ConcurrentDictionary when consolidating inside the GetOrAdd method.

It is the Lazy<> object which enforces that we only run our expensive delegate once. By calling Lazy<>.Value we trigger the delegate to run in a thread safe way, such that we can be sure it will only be run by one thread at a time. Other threads which call Lazy<>.Value simultaneously will be blocked until the first call completes, and then will use the same result.

Summary

When using GetOrAdd , if your valueFactory is idempotent and not expensive, then there is no need for this trick. You can be sure you will always get the same value with each call, but you need to be aware the valueFactory may run multiple times.

If you have an expensive operation that must be run only once as part of a call to GetOrAdd, then using Lazy<> is a great solution. The only caveat to be aware of is that Lazy<>.Value will block other threads trying to access the value until the first call completes. Depending on your use case, this may or may not be a problem, and is the reason GetOrAdd does not have these semantics by default.