blog post image
Andrew Lock avatar

Andrew Lock

~9 min read

Source generator updates: incremental generators

Exploring .NET 6 - Part 9

In the previous post, I described the LoggerMessage source generator that can give performance benefits without the boilerplate. In this post I look at the updates to the source generator API in .NET 6, why the changes were made, and how to update your source generators to use them.

Source generators in .NET 5

Source generators were added in .NET 5, and while the core .NET libraries themselves didn't make much use of them, the community quickly stepped in to add source generators for all sorts of things. I also dabbled with using source generators to replace things that would otherwise need to be done using reflection.

The .NET 5 source generator API (or, strictly, the Roslyn 3.x library) requires several basic things

  • Your project must target netstandard2.0
  • Your generator must implement ISourceGenerator
  • It must be decorated with the [Generator] attribute

The ISourceGenerator API requires you implement two methods:

using Microsoft.CodeAnalysis;

public interface ISourceGenerator
{
    void Initialize(GeneratorInitializationContext context);
    void Execute(GeneratorExecutionContext context);
}

The Initialize() method is passed a context variable, on which you can call two methods:

  • RegisterForPostInitialization()—this allows you to register a callback that will be called once. It is useful for efficiently adding "constant" sources to the compilation. For example, you might add "marker attributes" to the compilation, that are relied on by later stages in the source generator.
  • RegisterForSyntaxNotifications()—this allows you to create an ISyntaxReceiver. You can think of an ISyntaxReceiver as the "first stage" in many source generators. It will be called every time there is a generation with each syntax node in the project, so it can take note of any "interesting" nodes for use by the generator later.

The Execute() method of the source generator is passed a different context variable. There are various interesting methods on this context, but for our purposes we care about:

  • SyntaxReceiver - this is the ISyntaxReceiver that was created and populated at the start of the generation phase. Typically it contains references to the interesting pieces of syntax that you will use to generate your sources.
  • AddSource(String, SourceText)—used to add the result of your source generation to the compilation.
  • ReportDiagnostic(Diagnostic)—used to add an analyzer Diagnostic to the compilation (for example to describe incorrect usage of your source generation marker attributes etc).

These were the basic components of .NET 5/Roslyn 3.x source generators, in summary:

  1. A way to add constant sources to the compilation on startup
  2. A new ISyntaxReceiver created for every generation, which selects interesting syntax
  3. A "generator" to turn an ISyntaxReceiver into source code
  4. A way to add the generated sources to the compilation

This implementation certainly did the part on paper, and obviously lots of people started writing and using source generators. However, if your project got large, you might quickly run into performance issues…

The problems with performance for ISourceGenerator

The problem with the ISyntaxReceiver and ISourceGenerator approach is really in points 2 and 3 in the above list. As I mentioned, a new ISyntaxReceiver is created for every "generation" phase.

Well, it turns out, there a lot of generation phases. Like, a lot.

Whether it's as much as every key press in your editor, or debounced somewhat, there's just a huge amount of work being done by a generator. Imagine: every time you type something in your editor, a new instance of your ISyntaxReceiver is created, and it is passed every syntax node in the compilation. That's a lot of syntax. To give you a feel for it, this 5 lines of dummy C#

[MyTestThing]
public class TestSyntax
{
    public string Value { get; set; } = "default";
}

translates to a lot of syntax:

ClassDeclaration("TestSyntax")
.WithAttributeLists(
    SingletonList<AttributeListSyntax>(
        AttributeList(
            SingletonSeparatedList<AttributeSyntax>(
                Attribute(
                    IdentifierName("MyTestThing"))))))
.WithModifiers(TokenList(Token(SyntaxKind.PublicKeyword)))
.WithMembers(
    SingletonList<MemberDeclarationSyntax>(
        PropertyDeclaration(PredefinedType(Token(SyntaxKind.StringKeyword)), Identifier("Value"))
        .WithModifiers(TokenList(Token(SyntaxKind.PublicKeyword)))
        .WithAccessorList(
            AccessorList(
                List<AccessorDeclarationSyntax>(
                    new AccessorDeclarationSyntax[]{
                        AccessorDeclaration(SyntaxKind.GetAccessorDeclaration)
                            .WithSemicolonToken(Token(SyntaxKind.SemicolonToken)),
                        AccessorDeclaration(SyntaxKind.SetAccessorDeclaration)
                            .WithSemicolonToken(Token(SyntaxKind.SemicolonToken))})))
        .WithInitializer(EqualsValueClause(
            LiteralExpression(SyntaxKind.StringLiteralExpression,Literal("default"))))
        .WithSemicolonToken(Token(SyntaxKind.SemicolonToken))))))

Ever if you're very efficient, take care not to use LINQ, and don't do any more work than you need to, you can't avoid the fact you're doing a lot every time someone types in their editor.

Interestingly, the LoggerMessage and System.Text.Json source generators added in .NET 6 were originally built using this API and ran into exactly this problem.

VS typing performance quickly degrades (hangs, up to several seconds pauses between characters typing, etc.) after working with several cs files for like 20 minutes or more in the context of a medium-sized DotNet 6 Preview 6 solution with maybe 70 projects, each with ~ 10 to 500 files.

The really interesting thing in that issue are the responses from the .NET team. For example this comment:

It's extremely difficult to implement source generator caching in a way that is compatible with the semantics of the v1 (non-incremental) API. If the source generator contains a syntax receiver, this task should be treated as impossible; the source generator will not scale to large projects without migration to the v2 (incremental) API.

and this summary:

V1 source generation APIs have the characteristic of slow perf in large solutions, due to the SyntaxReceiver infrastructure which walks through all syntax trees every time source generators are invoked. This is true regardless of the implementation of the generator. V2 APIs (incremental source generators) are designed to solve this problem, but are (technically) only usable in VS 22.

So it's pretty clear that your shouldn't use the "old" ISourceGenerator API, and instead should shift to the new hotness, the V2 API coming in .NET 6/Roslyn 4.x, also known as "incremental generators".

Incremental generators in .NET 6

We'll start with the basics, to create an incremental generator in .NET 6:

  • Your project must target netstandard2.0
  • Your generator must implement IIncrementalGenerator
  • It must be decorated with the [Generator] attribute

So these requirements are almost the same as in .NET 5, you just have to implement IIncrementalGenerator instead of ISourceGenerator. You can find the design document/specification for incremental generators in the Roslyn repo on GitHub.

Note that you will also need to reference the 4.x version of the Microsoft.CodeAnalysis.CSharp.Workspaces NuGet package to access the interface.

The biggest change in incremental generators is that instead of having an ISyntaxReceiver and an Execute step that runs blindly for every generation, you define a "pipeline" of filters and transforms. This is typically what a "normal" source generator was doing anyway, but using a more rigid API provides an opportunity for performance improvements.

Specifically, the new generators use memoisation (AKA caching) to significantly reduce the cost of running your generator. The pipeline caches the input/output for every stage in the pipeline. If the input/output of any stage has been seen before, the generator can bypass the rest of the pipeline entirely, and just use the previously cached generated output.

This all sounds a bit abstract, and I found the API document somewhat difficult to grasp, so in the next section we'll take a basic look at what the API looks like for a case study: the LoggerMessage source generator.

Creating a source generator with the LoggerMessage source generator.

I'm not going to reproduce all the code from the LoggerMessage source generator, as much of it is the same whether you're using the V1 or V2 APIs. Instead I'm going to focus on the important first few stages of the generator.

Before we start, the following diagram shows a rough outline of the stages in the LoggerMessage source generator:

Incremental generator outline

Let's start by looking at the basic generator class itself:

[Generator]
public class LoggerMessageGenerator : IIncrementalGenerator
{
    public void Initialize(IncrementalGeneratorInitializationContext context)
    {
        // ...
    }
}

The generator implements IIncrementalGenerator, which requires implementing a single Initialize() method. Note that there's no Execute() method now, in contrast to the previous ISourceGenerator interface.

The context parameter passed to Initialize() contains quite a few members, but we'll start with just two:

  • RegisterPostInitializationOutput()—similar to the V1 API, this allows you to register "constant" sources to the compilation. This is for sources that are always required, regardless of the syntax and the output of the generation. This isn't used by the LoggerMessageGenerator.
  • SyntaxProvider—typically the "starting point" for a generator. You can use it to start building a "pipeline" of variables for your generator.

As mentioned above, the SyntaxProvider is where we start the pipeline. We can create a new "stage" in the pipeline by calling CreateSyntaxProvider(), providing a predicate for filtering and a transform for selected nodes, as shown below:

IncrementalValuesProvider<ClassDeclarationSyntax> classDeclarations = context.SyntaxProvider
    .CreateSyntaxProvider(
        predicate: static (s, _) => IsSyntaxTargetForGeneration(s), 
        transform: static (ctx, _) => GetSemanticTargetForGeneration(ctx))
    .Where(static m => m is not null);

static bool IsSyntaxTargetForGeneration(SyntaxNode node)
    => node is MethodDeclarationSyntax m && m.AttributeLists.Count > 0;

static ClassDeclarationSyntax? GetSemanticTargetForGeneration(GeneratorSyntaxContext context)
{
    // ...
}

In this example, the IsSyntaxTargetForGeneration uses a very simple predicate to check whether the syntax node is interesting. This predicate simply checks if the syntax is a method that has one or more attributes. The output of this stage will still include a lot of syntax nodes we're not interested in, but it will filter out most of the syntax we don't care about.

It's important for this first stage in the pipeline to be very fast and not to allocate, as it will be called a lot. As best as I can tell, this predicate will be called every time you make a change in your editor.

For syntax nodes that pass the first stage, the transform method GetSemanticTargetForGeneration() is called on them. In this stage, we do a little more work, but importantly we reduce our ouput to only the nodes we care about:

private const string LoggerMessageAttribute = "Microsoft.Extensions.Logging.LoggerMessageAttribute";

static ClassDeclarationSyntax? GetSemanticTargetForGeneration(GeneratorSyntaxContext context)
{
    // we know the node is a MethodDeclarationSyntax thanks to IsSyntaxTargetForGeneration
    var methodDeclarationSyntax = (MethodDeclarationSyntax)context.Node;

    // loop through all the attributes on the method
    foreach (AttributeListSyntax attributeListSyntax in methodDeclarationSyntax.AttributeLists)
    {
        foreach (AttributeSyntax attributeSyntax in attributeListSyntax.Attributes)
        {
            IMethodSymbol attributeSymbol = context.SemanticModel.GetSymbolInfo(attributeSyntax).Symbol as IMethodSymbol;
            if (attributeSymbol == null)
            {
                // weird, we couldn't get the symbol, ignore it
                continue;
            }

            INamedTypeSymbol attributeContainingTypeSymbol = attributeSymbol.ContainingType;
            string fullName = attributeContainingTypeSymbol.ToDisplayString();

            // Is the attribute the [LoggerMessage] attribute?
            if (fullName == LoggerMessageAttribute)
            {
                // return the parent class of the method
                return methodDeclarationSyntax.Parent as ClassDeclarationSyntax;
            }
        }
    }

    // we didn't find the attribute we were looking for
    return null;
}

The output of the transform is either the parent ClassDeclarationSyntax of the method, or null if we're not interested in the method. The next stage of the pipeline is to filter out the non-null values:

.Where(static m => m is not null);

Note that although the Where() clause looks like LINQ, it's actually an optimised version for use in the source generator pipeline.

It's important to understand the pipeline we've defined so far:

  1. Start with all the syntax in the program
  2. Quickly restrict to only methods that have an attribute
  3. Transform each syntax to either the parent class, if the method has the [LoggerMessage] attribute, or return null.
  4. Filter out null values.

Stages 2-4 will run a lot, basically for every edit you make in your IDE. But unless you're actually adding something that will change generation, i.e. a method with the [LoggerMessage] attribute, the output of step 4 will remain constant. That means nothing else in your generator needs to execute. This is a really powerful point, as it can massively reduce the amount of work that needs to happen.

The remainder of the generator looks something like this:

IncrementalValuesProvider<ClassDeclarationSyntax> classDeclarations = /// from previous snippet

IncrementalValueProvider<(Compilation, ImmutableArray<ClassDeclarationSyntax>)> compilationAndClasses 
    = context.CompilationProvider.Combine(classDeclarations.Collect());

context.RegisterSourceOutput(compilationAndClasses, 
    static (spc, source) => Execute(source.Item1, source.Item2, spc));

The next step in the pipeline "merges" the Compilation instance with the classDeclarations produced as the output of step 4 above. This is then fed into the context.RegisterSourceOutput() function which calls the Execute() function below. This is where the meaty expensive work of generation actually happens.

private static void Execute(Compilation compilation, ImmutableArray<ClassDeclarationSyntax> classes, SourceProductionContext context)
{
    if (classes.IsDefaultOrEmpty)
    {
        // nothing to do yet
        return;
    }

    IEnumerable<ClassDeclarationSyntax> distinctClasses = classes.Distinct();

    var p = new Parser(compilation, context.ReportDiagnostic, context.CancellationToken);
    
    IReadOnlyList<LoggerClass> logClasses = p.GetLogClasses(distinctClasses);
    if (logClasses.Count > 0)
    {
        var e = new Emitter();
        string result = e.Emit(logClasses, context.CancellationToken);

        context.AddSource("LoggerMessage.g.cs", SourceText.From(result, Encoding.UTF8));
    }
}

As I've tried to hammer home, if you've built your incremental generator well, most of the time, your source generator won't be running the code in Execute().

In terms of converting an existing ISourceGenerator to an IIncrementalGenerator, the equivalent of Execute() is where you would put most of your logic. The important thing would be to consider how to add the additional early filtering stages to keep your generator responsive even in large solutions!

Summary

In this post I described how source generators worked in .NET 5, along with some of the APIs available. I then described why this design was problematic, causing performance issues when used with large projects. Finally, I discussed the new incremental generators API, and how the new LoggerMessage source generator uses this API to minimise the work done it does by caching the results of intermediate stages.

Andrew Lock | .Net Escapades
Want an email when
there's new posts?