in .NET Core C# Source Code Dive Performance ~ 7 min read.

Reducing allocations by caching with StringBuilderCache
A deep dive on StringBuilder - Part 5

So far in this series we've looked in detail at StringBuilder, and how it works under-the-hood. In this post I look at a different type, the internal StringBuilderCache type. This type is used internally in .NET Core and .NET Framework to reduce the cost of creating a StringBuilder. In this post I describe why it's useful, run a small benchmark to see its impact, and walk through the code to show how it works.

Reducing allocations to improve performance

In the first post in this series, I discussed how .NET has focused on performance recently, with a particular focus on reducing allocations. This isn't a new problem for .NET, so in .NET 1.1 the StringBuilder class was introduced. This lets you efficiently concatenate strings, characters, and ToString()ed objects without creating a lot of intermediate strings.

However, StringBuilder itself is a class that is allocated on the heap. As we've seen throughout this series, internally, the StringBuilder uses a char[] and a linked list of StringBuilders to store the intermediate values. All of these are allocated on the heap.

In cases where you're doing a lot of string concatenation, the instances of the StringBuilder class (including the internal linked values) and the internal char[] buffer can put some pressure on the GC. That's where StringBuilderCache comes in.

Using StringBuilderCache to reduce StringBuilder allocations

StringBuilderCache is an internal class that has been present in .NET Framework and .NET Core for a looong time (I couldn't figure out exactly when, but it's since at least 2014, so .NET 4.5-ish). Being internal it's not directly usable by user code, but it's used by various classes in the heart of .NET.

The observation behind StringBuilderCache is that most cases where we need to build up a string, the size of the string will be relatively small. For example when formatting dates and times, you expect the final string to be relatively small. There are many other examples of cases like this, where you know the final string is going to be relatively small, but that you know the function will be called relatively frequently.

StringBuilderCache works (perhaps unsurprisingly) by caching a StringBuilder instance, and "loaning" it out whenever a StringBuilder is required. Calling code can request a StringBuilder instance and return it to the cache when it's finished with it. That means only a single instance of StringBuilder needs to be created by the app, as it can keep being re-used, reducing GC pressure on the app.

If your first thought is "that doesn't sound thread-safe", don't worry. As you'll see later, there's a single StringBuilder per thread, so that isn't a problem.

Let's take this toy sample which concatenates a user's name using the StringBuilderCache.

var user = new User
{
    FirstName = "Andrew",
    LastName = "Lock",
    Nickname = "Sock",
};

int requiredCapacity = user.FirstName.Length
                       + user.LastName.Length
                       + user.Nickname.Length
                       + 3;

// Fetch a StringBuilder of the required capacity. Instead of
// var sb = new StringBuilder(requiredCapacity);
StringBuilder sb = StringBuilderCache.Acquire(requiredCapacity);

sb.Append(user.FirstName);
sb.Append(user.LastName);
sb.Append(" (");
sb.Append(user.Nickname);
sb.Append(')');

// return the StringBuilder to the cache and retrieve the string. Instead of
// string fullName = sb.ToString();
string fullName = StringBuilderCache.GetStringAndRelease(sb);

As you can see, using StringBuilderCache is pretty simple, and mostly analogous to using a StringBuilder directly. The question is, does it improve performance?

Benchmarking StringBuilderCache

To see the impact of using StringBuilderCache over StringBuilder directly for a simple snippet like the above, I turned to BenchmarkDotNet. I copied the .NET 5 implementation of StringBuilderCache into my project (we'll look at the implementation shortly), and created the following simple benchmark, directly analogous to the above example:

[MemoryDiagnoser]
public class StringBuilderBenchmark
{
    private const string FirstName = "Andrew";
    private const string LastName = "Lock";
    private const string Nickname = "Sock";


    [Benchmark(Baseline = true)]
    public string UsingStringBuilder()
    {
        var sb = new StringBuilder();

        sb.Append(FirstName);
        sb.Append(LastName);
        sb.Append(" (");
        sb.Append(Nickname);
        sb.Append(')');

        return sb.ToString();
    }

    [Benchmark]
    public string UsingStringBuilderCache()
    {
        var sb = StringBuilderCache.Acquire();

        sb.Append(FirstName);
        sb.Append(LastName);
        sb.Append(" (");
        sb.Append(Nickname);
        sb.Append(')');

        return StringBuilderCache.GetStringAndRelease(sb);
    }
}

The results, running on my relatively old home laptop are as follows:


BenchmarkDotNet=v0.13.0, OS=Windows 10.0.19042.1052 (20H2/October2020Update)
Intel Core i7-7500U CPU 2.70GHz (Kaby Lake), 1 CPU, 4 logical and 2 physical cores
.NET SDK=5.0.104
  [Host]     : .NET 5.0.7 (5.0.721.25508), X64 RyuJIT
  DefaultJob : .NET 5.0.7 (5.0.721.25508), X64 RyuJIT


Method Mean Error StdDev Ratio Gen 0 Gen 1 Allocated
UsingStringBuilder 87.76 ns 1.832 ns 2.382 ns 1.00 0.1262 - 264 B
UsingStringBuilderCache 67.56 ns 3.670 ns 10.588 ns 0.69 0.0267 - 56 B

As you can see, using the StringBuilderCache gives a relative speed boost of 30% and allocates a fraction as much (56 vs 264 bytes).

Obviously, these are small speedups, but on a hot path, these sorts of micro-optimisations can be worthwhile.

We've looked at the benefit StringBuilderCache can bring. The next question is: how does it do it?

Looking at the implementation of StringBuilderCache

You can find the latest implementation of StringBuilderCache for .NET on GitHub, which is the implementation I show below. I'll give the whole implementation, and then discuss it below.

This version uses nullable reference types. You can also find an implementation for .NET Framework on https://referencesource.microsoft.com.


namespace System.Text
{
    /// <summary>Provide a cached reusable instance of stringbuilder per thread.</summary>
    internal static class StringBuilderCache
    {
        // The value 360 was chosen in discussion with performance experts as a compromise between using
        // as little memory per thread as possible and still covering a large part of short-lived
        // StringBuilder creations on the startup path of VS designers.
        internal const int MaxBuilderSize = 360;
        private const int DefaultCapacity = 16; // == StringBuilder.DefaultCapacity

        [ThreadStatic]
        private static StringBuilder? t_cachedInstance;

        /// <summary>Get a StringBuilder for the specified capacity.</summary>
        /// <remarks>If a StringBuilder of an appropriate size is cached, it will be returned and the cache emptied.</remarks>
        public static StringBuilder Acquire(int capacity = DefaultCapacity)
        {
            if (capacity <= MaxBuilderSize)
            {
                StringBuilder? sb = t_cachedInstance;
                if (sb != null)
                {
                    // Avoid stringbuilder block fragmentation by getting a new StringBuilder
                    // when the requested size is larger than the current capacity
                    if (capacity <= sb.Capacity)
                    {
                        t_cachedInstance = null;
                        sb.Clear();
                        return sb;
                    }
                }
            }

            return new StringBuilder(capacity);
        }

        /// <summary>Place the specified builder in the cache if it is not too big.</summary>
        public static void Release(StringBuilder sb)
        {
            if (sb.Capacity <= MaxBuilderSize)
            {
                t_cachedInstance = sb;
            }
        }

        /// <summary>ToString() the stringbuilder, Release it to the cache, and return the resulting string.</summary>
        public static string GetStringAndRelease(StringBuilder sb)
        {
            string result = sb.ToString();
            Release(sb);
            return result;
        }
    }
}

The code is helpfully heavily commented, but lets walk through the code anyway. I'm actually going to start at the end first, and look at the GetStringAndRelease and Release messages first.

internal const int MaxBuilderSize = 360;

[ThreadStatic]
private static StringBuilder? t_cachedInstance;

public static string GetStringAndRelease(StringBuilder sb)
{
    string result = sb.ToString();
    Release(sb);
    return result;
}

public static void Release(StringBuilder sb)
{
    if (sb.Capacity <= MaxBuilderSize)
    {
        t_cachedInstance = sb;
    }
}

The GetStringAndRelease() method is very simple, it just calls ToString() on the provided StringBuilder, calls Release() on the builder, and then returns the string.

The Release method is where the "caching" happens. The method checks to see if the provided StringBuilder's current capacity is less than the MaxBuilderSize constant (360), and if it is, it stores the StringBuilder in the ThreadStatic t_cachedInstance.

As mentioned in the code comments, the value of 360 is chosen to be large enough to be useful, but not too large that a lot of memory is used per thread. If this check wasn't here, and you released a StringBuilder with a large capacity, then you'd forever be using up that memory without releasing it, essentially causing a memory leak.

Marking the t_cachedInstance as [ThreadStatic] means that each separate thread in your application will see a different StringBuilder instance in t_cachedInstance. This avoids any chance of concurrency issues due to multiple threads accessing the field.

That covers the release part of the cache, lets look at the acquire part now:

internal const int MaxBuilderSize = 360;
private const int DefaultCapacity = 16; // == StringBuilder.DefaultCapacity

[ThreadStatic]
private static StringBuilder? t_cachedInstance;

public static StringBuilder Acquire(int capacity = DefaultCapacity)
{
    if (capacity <= MaxBuilderSize)
    {
        StringBuilder? sb = t_cachedInstance;
        if (sb != null)
        {
            // Avoid stringbuilder block fragmentation by getting a new StringBuilder
            // when the requested size is larger than the current capacity
            if (capacity <= sb.Capacity)
            {
                t_cachedInstance = null;
                sb.Clear();
                return sb;
            }
        }
    }

    return new StringBuilder(capacity);
}

When you call Acquire, you request a capacity for the StringBuilder. If the capacity is bigger than the cache's maximum capacity, then we bypass the cached value entirely, and just return a new StringBuilder. Similarly, if we haven't cached a StringBuilder yet, you just get a new one. For these cases, the StringBuilderCache doesn't add any value.

We also check whether the capacity requested is less than the cached StringBuilder's capacity. As mentioned in the comment, if we return a StringBuilder with a capacity that's smaller than the requested capacity, we can be pretty much certain we're going to have to grow the StringBuilder. That's fine, but it has a performance impact, so it's better in these cases to just return a new StringBuilder.

If you're in the sweet-spot—requesting a capacity less than MaxBuilderSize and less than the cached StringBuilder.Capacity—then you can reuse the cached instance. The cached instance is cleared (so if you call Acquire again before Release then you don't re-use the builder), and the StringBuilder is "reset" by calling Clear(). You can then use the StringBuilder as normal, finally calling GetStringAndRelease() to retrieve your built value, and to (potentially) add the builder to the cache.

That's all there is to it, a simple, single-value cache for StringBuilders. In the worse case it's no worse than using new StringBuilder(), and in the best case you can avoid a few allocations.

Using StringBuilderCache in your own projects

The only downside to StringBuilderCache is that you can't easily use it in your own projects! StringBuilderCache is internal, so there's no way to use it directly outside the core .NET libraries.

Luckily, the code is simple enough (and the license permissive-enough) that you can generally copy-paste the implementation into your own code. As an example, we use a similar implementation in the Datadog .NET Tracer library.

Another possibility, if you're trying to reduce the impact of StringBuilders on a hot-patg, it to look at another internal type, ValueStringBuilder. I'll look at this type in another post.

Summary

In this post I discussed the need to reduce allocations for performance reasons, and the role of StringBuilder in helping with that. However, the StringBuilder class itself must be allocated. StringBuilderCache provides a way to reduce the impact of allocating a StringBuilder by reusing a single StringBuilder instance per thread. I showed in a micro-benchmark that this can reduce allocation and improve performance. I then walked through the code to show how it was achieved.

Loading comments powered by Disqus, please wait…
Andrew Lock | .Net Escapades

Stay up to the date with the latest posts!

Oops! Check your details and try again.
Thanks! Check your email for confirmation.