Redacting sensitive data in logs with Microsoft.Extensions.Compliance.Redaction

In my previous two posts I've been looking at new logging features added in .NET 8 by way of the Microsoft.Extensions.Telemetry and Microsoft.Extensions.Telemetry.Abstractions packages. In this post we look at a different, but related, package: Microsoft.Extensions.Compliance.Redaction.

Why do you need redaction?

If you're just getting started with ASP.NET Core, "redaction" isn't something you'll likely worry about. The ease with which you can write logs in ASP.NET Core, and the ease with which libraries can plugin to your application logs (now that everyone's settled on the "default" Microsoft abstractions as a baseline) are great for debugging and understanding your app.

However, once you start putting things into production, especially if you're working for a privacy- and/or security-conscious company, you have to start being more careful. You need to think about each thing you write to your logs, is it ok for that to be stored in plain-text and potentially shipped off to a third-party system?

For some things, the answer is an obvious and emphatic "no". It's never acceptable to write passwords, access tokens, or private keys into logs.

In other cases, the answer might be "yes, but…".

For example, there are various laws around, such as the General Data Protection Regulation (GDPR) in the EU, which proscribe what data you're allowed to process, and when you need consent etc. Additionally, there's a difference between personal data and sensitive data. Sensitive data is personal data that you need to handle even more carefully. This includes categories such as health data, religious beliefs, sexual orientation etc.

Writing personal or sensitive data into your logs would likely be a breach of the provisions that protect consumers, so it's very unlikely you'd want to record that data in your logs. But new features like [LogProperties] make it easy to "leak" data like this. The simplicity with which you can pass an object to a log message and have the source generator write all it's properties is at-odds with the "careful and controlled" approach you need to take with customer data.

So that's where Microsoft.Extensions.Compliance.Redaction comes in.

Adding redaction with Microsoft.Extensions.Compliance.Redaction

The Microsoft.Extensions.Compliance.Redaction package (and the Microsoft.Extensions.Compliance.Redaction.Abstraction it builds on) provides a general-purpose framework for adding redaction to your application. In this post I'm focusing on logging, but the functionality is more general than that.

In later posts we'll see how you can enable redaction in other parts of the framework

To enable redaction in your logs, you need to do 3 things:

Define your data classification taxonomy i.e. what types of data do you have?
Add and enable the redaction services for the logger.
Classify all the data in your models using your taxonomy.

That's all a bit abstract, so we'll walk through an example to understand what it all means.

Reviewing the sample app

For this example, I'm going to continue with the simplified WeatherForecast app I've used in the previous two posts. We'll start with an app that looks like this:

var builder = WebApplication.CreateBuilder(args);
var app = builder.Build();

// Minimal API using the static handler pattern 👇
app.MapGet("/weatherforecast", Handler.GetForecasts);

app.Run();

// Note  👇 The handler must be partial when you're using the source generator
internal partial class Handler
{ 
    // Inject the ILogger instance using DI     👇
    public static WeatherForecast[] GetForecasts(ILogger<Handler> logger)
    {
        var entriesToGenerate = Random.Shared.Next(5); // Generate a number between 0-4

        // Generate the instances
        var entries = new WeatherForecast[entriesToGenerate];
        for (int i = 0; i < entriesToGenerate; i++)
        {
            var forecast = new WeatherForecast
            (
                Date: DateOnly.FromDateTime(DateTime.Now.AddDays(i)),
                TemperatureC: Random.Shared.Next(-20, 55)
            );

            GeneratedForecast(logger, i, forecast); // 👈 Log the forecast
            entries[i] = forecast;
        }

        return entries;

    }

    // Use the source generator to log the number of forecasts
    [LoggerMessage(Level = LogLevel.Debug,Message = "Generating forecast {EntryNumber}")]
    private static partial void GeneratedForecast(
      ILogger logger,
      int entryNumber,
      [LogProperties] WeatherForecast forecast);
}

// The slightly simplified WeatherForecast record
internal record WeatherForecast(DateOnly Date, int TemperatureC)
{
    public int TemperatureF => 32 + (int)(TemperatureC / 0.5556);
}

This app has a single API that returns 0-4 weather forecasts. We're using [LogProperties] to log the properties of each WeatherForecast instance as they're generated.

When we run the app, the generated logs look something like this:

{
  "EventId": 0,
  "LogLevel": "Debug",
  "Category": "Handler",
  "Message": "Generating forecast 0",
  "State": {
    "Message": "{OriginalFormat}=Generating forecast {EntryNumber},forecast.TemperatureF=-3,forecast.TemperatureC=-20,forecast.Date=12/02/2023,EntryNumber=0",
    "{OriginalFormat}": "Generating forecast {EntryNumber}",
    "forecast.TemperatureF": -3,
    "forecast.TemperatureC": -20,
    "forecast.Date": "12/02/2023",
    "EntryNumber": 0
  }
}

So that's our starting point, now lets go through the steps to add redaction to the app.

1. Defining a data classification taxonomy

Before doing anything else you need to decide what types of data you need to handle. For example, you might decide you have three types of data:

Sensitive data
Personally Identifiable Data
Other data

You may well have more categories than this, and these might not be applicable for you. Ultimately, this is a compliance issue, so you'll certainly need a conversation with business owners and quite possibly lawyers to figure out exactly what types of data you will be processing, as well as your requirements and obligations for working with it.

We'll stick with these three types of data for our taxonomy, and for simplicity, I'm going to treat "other data" as non-sensitive, non-personal data that we don't need to handle in any special way. That data can be written to logs without any concerns.

To define your taxonomy first add a reference to Microsoft.Extensions.Compliance.Redaction and Microsoft.Extensions.Telemetry

dotnet add package Microsoft.Extensions.Compliance.Redaction

or by adding the package references directly to your csproj:

<Project Sdk="Microsoft.NET.Sdk.Web">

  <PropertyGroup>
    <TargetFramework>net8.0</TargetFramework>
    <Nullable>enable</Nullable>
    <ImplicitUsings>enable</ImplicitUsings>
    <InvariantGlobalization>true</InvariantGlobalization>
  </PropertyGroup>

  <ItemGroup>
    <PackageReference Include="Microsoft.Extensions.Telemetry" Version="8.0.0" />
    <PackageReference Include="Microsoft.Extensions.Compliance.Redaction" Version="8.0.0" />
  </ItemGroup>

</Project>

Technically you only need to add the Microsoft.Extensions.Compliance.Redaction.Abstractions package to define your data taxonomy, but we're going to need the above packages later anyway, and they bring in all the necessary dependencies.

You define your taxonomy as a collection of readonly struct DataClassification instances. The approach taken in the sample/testing packages by Microsoft are to define these in a static class, similar to the following:

public static class MyTaxonomy
{
    public static string TaxonomyName => typeof(MyTaxonomy).FullName!;

    public static DataClassification SensitiveData => new(TaxonomyName, nameof(SensitiveData));

    public static DataClassification PersonalData => new(TaxonomyName, nameof(PersonalData));
}

The above code defines our two data types: SensitiveData and PersonalData. To make these classifications easier to work with, we also create two corresponding attributes. These derive from the DataClassificationAttribute attribute and take a single classification instance:


public class SensitiveDataAttribute : DataClassificationAttribute
{
    public SensitiveDataAttribute() : base(MyTaxonomy.SensitiveData) { }
}

public class PersonalDataAttribute : DataClassificationAttribute
{
    public PersonalDataAttribute() : base(MyTaxonomy.PersonalData) { }
}

As well as our existing data classifications, there are two "built-in" classifications defined as part of DataClassification

public readonly struct DataClassification : IEquatable<DataClassification>
{
    /// <summary>
    /// Gets the value to represent data with no defined classification.
    /// </summary>
    public static DataClassification None => new(nameof(None));

    /// <summary>
    /// Gets the value to represent data with an unknown classification.
    /// </summary>
    public static DataClassification Unknown => new(nameof(Unknown));
    
    //...
}

These can be used as part of your data taxonomy and have corresponding attributes [NoDataClassification] and [UnknownDataClassification]

2. Adding and configuring the redaction services

Now that you've defined your taxonomy, you can add the redaction services to your app's dependency injection container. You'll also need to decide how each type of data will be redacted. For example, you could choose to redact the data in your logs by:

Replacing it with an empty string
Replacing it with a fixed string ****
Encrypting the data
Hashing the data

The Microsoft.Extensions.Compliance.Redaction package includes two redactor implementations:

ErasingRedactor. This replaces the data with an empty string.
HmacRedactor. This replaces the data with the HMAC256 encoded version of the data.

The erasing redactor is clearly the most thorough approach, because whatever the input, you get the same output: an empty string! The HMAC256 redactor, on the other hand, has the property that an identical input will give an identical output. That means you can correlate values, so you can tell that two values were the same even though you won't know what the actual original value was.

Based on those details, let's choose the following redactors:

For SensitiveData, use the ErasingRedactor.
For PersonalData use the HmacRedactor.

var builder = WebApplication.CreateBuilder(args);

// 👇 Enable redaction of `[LogProperties]` objects
builder.Logging.EnableRedaction();
// Add the redaction services
builder.Services.AddRedaction(x =>
{
    //  Enable the erasing redactor for sensive data
    x.SetRedactor<ErasingRedactor>(new DataClassificationSet(MyTaxonomy.SensitiveData));  

    // 👇 Enable the HMAC256 redactor for personal data
    // Note that the HMAC redactor is experimental, so you have to explicitly
    // acknowledge that with a #pragma
#pragma warning disable EXTEXP0002 // Type is for evaluation purposes only
    x.SetHmacRedactor(hmacOpts =>
    {
        // ⚠ Don't do this in a real project - you need to load these values
        // from an options secret!
        hmacOpts.Key = Convert.ToBase64String(Encoding.UTF8.GetBytes("Some super secret key that's really long for security"));
        hmacOpts.KeyId = 123;
    }, new DataClassificationSet(MyTaxonomy.PersonalData));
});

The snippet above

Enables [LogProperties] redaction
Adds the redaction services
Configures SensitiveData to use the ErasingRedactor
Configures PersonalData to use the HmacRedactor

As mentioned in the snippet, don't hard-code the key in your code like this 😅 This is strictly for demo purposes. You can use all the standard support for IOptions<> and IConfigurationSource to store the key in a secure location and read it at runtime.

The KeyId in the HMAC options is an "identifier" used to identify the key. It is prepended to redacted values, and means you can tell whether two values were redacted using the same key. If the same key was used to redact two values, and the redacted values are different, then you know the original values were also different. If comparing two values with different keys, you can't draw the same conclusion.

Ok, you have a taxonomy, and you've enabled redaction, all that remains is to mark up your data.

3. Classifying your data

This final step is to decorate your data models with the [SensitiveData] and [PersonalData] attributes. That's where this demo goes a bit off the rails because we're going to mark our made-up WeatherForecast 😅

Remember you can also use the [NoDataClassification] and [UnknownDataClassification] attributes to classify data with the built-in None and Unknown classifications if you wish.

For demo purposes, let's assume that the TemperatureC property is sensitive and the TemperatureF property is personal. We'll leave the date unclassified.

internal record WeatherForecast(
    DateOnly Date, // Unclassified
    [SensitiveData] int TemperatureC) // Sensitive data
{
    [PersonalData] // Personal data
    public int TemperatureF => 32 + (int)(TemperatureC / 0.5556);
}

All that remains is to give it a try!

Testing the redaction

As a reminder, before we added redaction, our log messages looked like this:

{
  "EventId": 0,
  "LogLevel": "Debug",
  "Category": "Handler",
  "Message": "Generating forecast 0",
  "State": {
    "Message": "{OriginalFormat}=Generating forecast {EntryNumber},forecast.TemperatureF=-3,forecast.TemperatureC=-20,forecast.Date=12/02/2023,EntryNumber=0",
    "{OriginalFormat}": "Generating forecast {EntryNumber}",
    "forecast.TemperatureF": -3,
    "forecast.TemperatureC": -20,
    "forecast.Date": "12/02/2023",
    "EntryNumber": 0
  }
}

If we run the application again, they now look like this:

{
  "EventId": 0,
  "LogLevel": "Debug",
  "Category": "Handler",
  "Message": "Generating forecast 3",
  "State": {
    "Message": "Microsoft.Extensions.Logging.ExtendedLogger\u002BModernTagJoiner",
    "{OriginalFormat}": "Generating forecast {EntryNumber}",
    "forecast.Date": "12/05/2023",
    "EntryNumber": 3,
    "forecast.TemperatureF": "123:cNhjHrN5uLpvybrNxLdWsg==", //👈 Redacted with HMAC256
    "forecast.TemperatureC": "" // 👈 Redacted
  }
}

There's various things to note here:

The "sensitive" TemperatureC property has been "erased".
The "personal" TemperatureF property has been replaced with $"{KeyId}:{Hmac}".
The "normal" Date property is rendered as normal.
The State.Message property has an unhelpful utility class name, as it did in the previous post 😅.
The int values previously rendered were converted to strings during the redaction (the forecast.TemperatureF property was previously an int, now it's a string).

And that pretty much covers it. The main decisions to make are around your data taxonomy and how to redact the various classifications. After that, it's "just" a case of applying it to every object you pass with [LogProperties]. Which, you know, could be a lot!

Applying data classifications with `[TagProvider]`

In my previous post I showed that you could use [TagProvider] to control how objects are logged instead of using [LogProperties] and [LogPropertyIgnore]. If you're using this approach, you can apply data classifications at the same time. For example, the following provider would configure the same rules we described for the WeatherForecast object.

internal static class WeatherForecastTagProvider
{
    public static void RecordTags(ITagCollector collector, WeatherForecast forecast)
    {
        collector.Add(nameof(forecast.TemperatureC), forecast?.TemperatureC, 
            new DataClassificationSet(MyTaxonomy.SensitiveData)); // 👈 TemperatureC is sensitive
        collector.Add(nameof(forecast.TemperatureF), forecast?.TemperatureF, 
            new DataClassificationSet(MyTaxonomy.PersonalData)); // 👈 TemperatureF is personal
        collector.Add(nameof(forecast.Date), forecast?.Date); // 👈 Date has no classification
    }
}

This achieves the same result as applying the [PersonalData] and [SensitiveData] attributes does when you use [LogProperties].

Note that you must apply the data classifications here if you're using [TagProvider]. Tags added using [TagProvider] ignore the [PersonalData] and [SensitiveData] attributes applied to the WeatherForecast object.

Creating a custom redactor

Given how important redaction is, I would be wary about trying to implement your own redactor. That said, some trivial redactors will be fine. For example, the following redactor is an alternative to the ErasingRedactor which writes the string REDACTED instead of an empty string:

// Derive from the Redactor base class in Microsoft.Extensions.Compliance.Abstractions
public class MyErasingRedactor : Redactor
{
    private const string ErasedValue = "REDACTED"; // Use this value for sensitive data

    public override int GetRedactedLength(ReadOnlySpan<char> input)
        => ErasedValue.Length;

    public override int Redact(ReadOnlySpan<char> source, Span<char> destination)
    {
        // The base class ensures destination has sufficient capacity
        ErasedValue.CopyTo(destination);
        return ErasedValue.Length;
    }
}

We can then register this redactor in place of the ErasingRedactor for redacting sensitive data:

var builder = WebApplication.CreateBuilder(args);
builder.Logging.EnableRedaction();
builder.Services.AddRedaction(x =>
{
    x.SetRedactor<MyErasingRedactor>(new DataClassificationSet(MyTaxonomy.SensitiveData));
    // ...
});
// ...

As expected, the results (simplified below) show that TemperatureC has the fixed REDACTED value as expected.

{
  "State": {
    "forecast.Date": "12/05/2023",
    "forecast.TemperatureF": "123:2Ofr9l6D9njTj5oFTOQkWg==",
    "forecast.TemperatureC": "REDACTED"
  }
}

You can also configure your redactor a the "fallback" redactor. The logger uses the fallback redactor used whenever it encounters classified data that doesn't have an explicit redaction configuration (such as DataClassification.Unknown in our example).

var builder = WebApplication.CreateBuilder(args);
builder.Logging.EnableRedaction();
builder.Services.AddRedaction(x =>
{
    x.SetFallbackRedactor<MyErasingRedactor>();
    // ...
});
// ...

The default fallback redactor is the built-in ErasingRedactor.

Things to watch out for

Before we finish, I just want to flag a few sharp edges I found playing with redaction initially.

1. Enabling redaction without redactors is bad

When I first tried out the log redaction, I setup my data taxonomy, applied it to WeatherForecast and called Logging.AddRedaction(). However, when I checked the logs, the results were definitely not redacted:

{
  "State": {
    "forecast.TemperatureC": "-11:forecast.TemperatureC",
    "forecast.Date": "11/28/2023",
    "forecast.TemperatureF": "13:forecast.TemperatureF"
  }
}

The problem was that I didn't call builder.Services.AddRedaction(), so the redaction system was using the NullRedactor. That's particularly problematic because AddRedaction() doesn't even show up in IntelliSense until you add the Microsoft.Extensions.Compliance.Redaction package, so it's not at all obvious you're doing something wrong 😬

To be honest, I think defaulting to the NullRedactor is a mistake here. A better option IMO would be to not register a default, and to just throw. This would give guidance about what's missing. Unfortunately, I think that ship has likely sailed.

Another interesting point this revealed is that the redactor appends the name of the property to the data value by default. This prevents "accidental" correlation between different properties, which may unintentionally reveal correlations within records.

2. You must provide data classifications when calling `SetRedactor`

Once I'd figured out my mistake, I tried to add the HmacRedactor, by calling SetHmacRedactor(). Unfortunately, it didn't seem to do anything.

This was another mistake of mine. The API for SetHmacRedactor() is:

public static IRedactionBuilder SetHmacRedactor(
  this IRedactionBuilder builder, 
  Action<HmacRedactorOptions> configure,
  params DataClassificationSet[] classifications);

Note that the classifications argument is a params object. That means it can have 0+ values.

I assumed that meant you could omit the parameter entirely, and it would apply to all classifications, but no. That actually means it applies to no classifications, so it's never used.

Behind the scenes this implementation actually calls SetRedactor<T>(params DataClassificationSet[]) so my complaint is actually with that method. If passing an empty array to this method doesn't make any sense, you shouldn't be able to do it IMO 🤷‍♂️

3. Only `[LogProperties]` and `[TagProvider]` value are redacted

This one is probably obvious, but I think it's worth calling out specifically. Logging redaction only applies to objects you decorate with [LogProperties] and [TagProvider] and only when you use the [LoggerMessage] source generator. You won't get redaction if you:

Use the ILogger.Debug() etc extension methods (i.e. you don't use the source generator)
Pass objects or properties as template parameters in the log message.

For that latter point, I mean that if you have something like this:

[LoggerMessage(Level = LogLevel.Debug, Message = "Temperature: {Temp}")]
private static partial void GeneratedForecast(ILogger logger,  int temp);

and you call it like this:

ILogger logger = // ...
WeatherForecast forecast = // ...
GeneratedForecast(logger, forecast.TemperatureC); // 👈 This won't be redacted

Then the TemperatureC will be written to the logs, regardless of whether it has a data classification.

And with that, we've finally reached the end of this post on redaction!

Summary

In this post I showed how you can add redaction to your [LogProperties] and [TagProvider] objects using the Microsoft.Extensions.Compliance.Redaction package. To redact these objects you must first create a data taxonomy, defining different types of data. You then add the redaction services to your app, and define how each type of data should be redacted. Finally, you either decorate your objects with attributes describing their data, or assign a data classification in your [TagProvider] implementation.

Andrew Lock | .NET Escapades Andrew Lock