Creating standard and "observable" instruments

In the first post in this series I provided an introduction to the System.Diagnostics.Metrics APIs introduced in .NET 6. I initially introduced the concept of "observable" Instruments in that post, but didn't go into more details. In this post, we'll understand what being "observable" means, and how these Instruments differ from non-observable Instruments.

I start the post with a quick refresher on the basics of the System.Diagnostics.Metrics APIs, such as the different types of instruments available. I then show how you can create each of the instrument types and produce values from them.

System.Diagnostics.Metrics APIs

The System.Diagnostics.Metrics APIs were introduced in .NET 6 but are available in earlier runtimes (including .NET Framework) by using the System.Diagnostics.DiagnosticSource NuGet package. There are two primary concepts exposed by these APIs: Instrument and Meter:

Instrument: An instrument records the values for a single metric of interest. You might have separate Instruments for "products sold", "invoices created", "invoice total", or "GC heap size".
Meter: A Meter is a logical grouping of multiple instruments. For example, the System.Runtime Meter contains multiple Instruments about the workings of the runtime, while the Microsoft.AspNetCore.Hosting Meter contains Instruments about the HTTP requests received by ASP.NET Core.

There are also (currently, as of .NET 10) 7 different types of Instrument:

Counter<T>
ObservableCounter<T>
UpDownCounter<T>
ObservableUpDownCounter<T>
Gauge<T>
ObservableGauge<T>
Histogram<T>.

To create a custom metric, you need to choose the type of Instrument to use, and associate it with a Meter. I'll discuss the differences between each of these instruments shortly, but first we'll look at the difference between "observable" instruments, and "normal" instruments.

What is an `Observable*` instrument?

When using the System.Diagnostic.Metrics APIs there's a "producer" side and a "consumer" side. The producer of metrics is the app itself, recording values and details about how it's operating. The consumer could be an in-process consumer, such as the OpenTelemetry libraries, or it could be an external process, such as dotnet-counters or dotnet-monitor.

The differences between a "normal" instrument and an "observable" instrument stem from who controls when and how a value is emitted:

For "normal" instruments, the producer emits values as they occur. For example, when a request is received, ASP.NET Core emits the http.server.active_requests metric, indicating a new request is in-flight.
For "observable" instruments, the consumer side asks for the value. For example, the dotnet.gc.pause.time metric returns "The total amount of time paused in GC since the process has started", but only when you ask for it.

In general, observable instruments are used when you have an effectively continuous value that you wouldn't make sense for the consumer to actively emit, such as the dotnet.gc.pause.time above, or where emitting all of the intermediate values would be too expensive from a performance point of view.

Technically, you could potentially emit this metric every time the GC pauses, but given that these values are more fine-grained than you would likely want anyway, it's much more efficient to allow the consumer to "poll" the values on demand, and therefore it makes the most sense as an observable instrument.

Now we understand the difference between observable and normal instruments, let's walk through all the instrumentation types and see how they're used in the .NET base class libraries.

Understanding the different `Instrument` types

So far in this series we've used a simple Counter<T> that records every time a given event occurs. In this post we'll look at each of the possible Instruments in turn, showing how you create an instrument of that type to produce a given metric. Where possible, I'm showing places within the .NET or ASP.NET Core libraries that use each of these instruments, to give "real world" versions of how these are used.

`Counter<T>`

The Counter<T> instrument is one of the simplest instruments conceptually. It is used to record how many times a given event occurs.

For example, the aspnetcore.diagnostics.exceptions metric is a Counter<long> which records the "Number of exceptions caught by exception handling middleware."

_handlerExceptionCounter = _meter.CreateCounter<long>(
    "aspnetcore.diagnostics.exceptions",
    unit: "{exception}",
    description: "Number of exceptions caught by exception handling middleware.");

Every time the ExceptionHandlerMiddleware (or DeveloperExceptionHandlerMiddleware) catches an exception, it adds 1 to this counter, first constructing an appropriate set of tags, and then calling Add(1, tags):

 private void RequestExceptionCore(string exceptionName, ExceptionResult result, string? handler)
{
    var tags = new TagList();
    tags.Add("error.type", exceptionName);
    tags.Add("aspnetcore.diagnostics.exception.result", GetExceptionResult(result));
    if (handler != null)
    {
        tags.Add("aspnetcore.diagnostics.handler.type", handler);
    }
    _handlerExceptionCounter.Add(1, tags);
}

As this Counter<T> is tracking a number of occurrences, you're always adding positive values, never negative values, though you can increase by more than 1 at a time if needs be.

`ObservableCounter<T>`

The ObservableCounter<T> is conceptually similar to a Counter<T>, in that it records monotonically increasing values. Being an "observable" instrument, it only records the values when "observed" (we'll look at how to observe the instruments in your own code in a subsequent post).

For example, the dotnet.gc.heap.total_allocated metric is an ObservableCounter<long> which records the "The approximate number of bytes allocated on the managed GC heap since the process has started":

s_meter.CreateObservableCounter(
    "dotnet.gc.heap.total_allocated",
    () => GC.GetTotalAllocatedBytes(),
    unit: "By",
    description: "The approximate number of bytes allocated on the managed GC heap since the process has started. The returned value does not include any native allocations.");

When observed, the lambda included in the definition is called, which invokes GC.GetTotalAllocatedBytes(). Note that this value steadily increases during the lifetime of the app, so it's not returning the difference since last invocation, it's returning the current running total.

`UpDownCounter<T>`

The UpDownCounter<T> is similar to the Counter<T>, but it supports reporting positive or negative values.

For example, the http.server.active_requests metric is an UpDownCounter<T> that records the "Number of active HTTP server requests.":

_activeRequestsCounter = _meter.CreateUpDownCounter<long>(
    "http.server.active_requests",
    unit: "{request}",
    description: "Number of active HTTP server requests.");

When a request is started, the server calls Add() and increments the value of the counter:

public void RequestStart(string scheme, string method)
{
    // Tags must match request end.
    var tags = new TagList();
    InitializeRequestTags(ref tags, scheme, method);
    _activeRequestsCounter.Add(1, tags);
}

private static void InitializeRequestTags(ref TagList tags, string scheme, string method)
{
    tags.Add(HostingTelemetryHelpers.AttributeUrlScheme, scheme);
    tags.Add(HostingTelemetryHelpers.AttributeHttpRequestMethod, HostingTelemetryHelpers.GetNormalizedHttpMethod(method));
}

Similarly, when the request ends, the server calls Add() to decrement the value of the counter:

public void RequestEnd(string protocol, string scheme, string method, string? route, int statusCode, bool unhandledRequest, Exception? exception, List<KeyValuePair<string, object?>>? customTags, long startTimestamp, long currentTimestamp, bool disableHttpRequestDurationMetric)
{
    var tags = new TagList();
    InitializeRequestTags(ref tags, scheme, method);

    // Tags must match request start.
    if (_activeRequestsCounter.Enabled)
    {
        _activeRequestsCounter.Add(-1, tags);
    }

    // ...
}

Consequently, the UpDownCounter<T> receives a series of increment/decrement values representing the movement of the metric.

`ObservableUpDownCounter<T>`

The ObservableUpDownCounter<T> is similar to the UpDownCounter<T> in that it reports increasing or decreasing values of a metric. The difference is that it returns the absolute value of the metric when observed, as opposed to a stream of deltas.

For example, the dotnet.gc.last_collection.heap.size metric is an ObservableUpDownCounter<long> that reports "The managed GC heap size (including fragmentation), as observed during the latest garbage collection":

s_meter.CreateObservableUpDownCounter(
    "dotnet.gc.last_collection.heap.size",
    GetHeapSizes,
    unit: "By",
    description: "The managed GC heap size (including fragmentation), as observed during the latest garbage collection.");

When observed, the GetHeapSizes() method is invoked and returns a collection of Measurements, each tagged by the heap generation name:

private static readonly string[] s_genNames = ["gen0", "gen1", "gen2", "loh", "poh"];
private static readonly int s_maxGenerations = Math.Min(GC.GetGCMemoryInfo().GenerationInfo.Length, s_genNames.Length);

private static IEnumerable<Measurement<long>> GetHeapSizes()
{
    GCMemoryInfo gcInfo = GC.GetGCMemoryInfo();

    for (int i = 0; i < s_maxGenerations; ++i)
    {
        yield return new Measurement<long>(gcInfo.GenerationInfo[i].SizeAfterBytes, new KeyValuePair<string, object?>("gc.heap.generation", s_genNames[i]));
    }
}

This returns the size of each heap at the last GC collection, the value of which may obviously increase or decrease.

`Gauge<T>`

The Gauge<T> is used to record "non-additive" values whenever they occur. These values can go up and down, and be positive or negative, but the point is that they "overwrite" all previous values.

Interestingly, this Instrument type was only added in .NET 9, and I couldn't find a single case of Gauge<T> being used in the .NET runtime, ASP.NET Core, or the .NET extensions packages 😅 So I made one up: for example, consider a gauge that reports the current room temperature when it changes:

var instrument = _meter.CreateGauge<double>(
    name: "locations.room.temperature",
    unit: "°C",
    description: "Current room temperature"
);

Then when the temperature of the room changes, you would report the new value:

public void OnOfficeTemperatureChanged(double newTemperature)
{
    instrument.Record(newTemperature, new KeyValuePair<string, object?>("room", "office"));
}

The gauge values are record whenever the temperature changes.

`ObservableGauge<T>`

Conceptually the ObservableGauge<T> is the same as a Gauge<T>, except that it only produces a value when observed. ObservableGauge<T> was added way back in .NET 6, and there are some examples of its use in this case.

For example, the process.cpu.utilization metric is an ObservableGauge<double> instrument which reports "The CPU consumption of the running application in range [0, 1]".

_ = meter.CreateObservableGauge(
    name: "process.cpu.utilization",
    observeValue: CpuPercentage);

When observed, the CpuPercentage() method is invoked, which returns a single value for the CPU usage as a value between 0 and 1.

private double CpuPercentage()
{
    // see above link for implementation
}

This Instrument is exposed in the Microsoft.Extensions.Diagnostics.ResourceMonitoring meter, and implemented in the Microsoft.Extensions.Diagnostics.ResourceMonitoring NuGet package.

`Histogram<T>`

The final instrument type is Histogram<T>, which is used to report arbitrary values, that you will typically want to aggregate using statistics.

For example, the http.server.request.duration metric is a Histogram<double> which records the "Duration of HTTP server requests.". Durations and latencies are a classic example of where you might want to use a histogram, so that you can calculate the p50, p90, p99 etc latencies, or to record all the values and plot them as a graph.

_requestDuration = _meter.CreateHistogram<double>(
    "http.server.request.duration",
    unit: "s",
    description: "Duration of HTTP server requests.",
    advice: new InstrumentAdvice<double> { HistogramBucketBoundaries = MetricsConstants.ShortSecondsBucketBoundaries });

The example above also shows our first example of InstrumentAdvice<T>. This type provides suggested configuration settings for consumers, indicating the best settings to use when processing Instrument values. In this case, the advice provides a suggested set of histogram bucket boundaries: [0.005, 0.01, 0.025, 0.05, 0.075, 0.1, 0.25, 0.5, 0.75, 1, 2.5, 5, 7.5, 10], which can be useful for consumers to know how best to plot the metric values.

The _requestDuration histogram instrument is called whenever an ASP.NET Core request ends, recording the duration of the request, and a large associated number of tags. I've reproduced all the code below for completeness (expanding tag constants for clarity) but it's basically just building up a collection of tags which are recorded along with the duration of the request.

public void RequestEnd(string protocol, string scheme, string method, string? route, int statusCode, bool unhandledRequest, Exception? exception, List<KeyValuePair<string, object?>>? customTags, long startTimestamp, long currentTimestamp, bool disableHttpRequestDurationMetric)
{
    var tags = new TagList();
    InitializeRequestTags(ref tags, scheme, method);

    if (!disableHttpRequestDurationMetric && _requestDuration.Enabled)
    {
        if (HostingTelemetryHelpers.TryGetHttpVersion(protocol, out var httpVersion))
        {
            tags.Add("network.protocol.version", httpVersion);
        }
        if (unhandledRequest)
        {
            tags.Add("aspnetcore.request.is_unhandled", true);
        }

        // Add information gathered during request.
        tags.Add("http.response.status_code", HostingTelemetryHelpers.GetBoxedStatusCode(statusCode));
        if (route != null)
        {
            tags.Add("http.route", RouteDiagnosticsHelpers.ResolveHttpRoute(route));
        }

        // Add before some built in tags so custom tags are prioritized when dealing with duplicates.
        if (customTags != null)
        {
            for (var i = 0; i < customTags.Count; i++)
            {
                tags.Add(customTags[i]);
            }
        }

        // This exception is only present if there is an unhandled exception.
        // An exception caught by ExceptionHandlerMiddleware and DeveloperExceptionMiddleware isn't thrown to here. Instead, those middleware add error.type to custom tags.
        if (exception != null)
        {
            // Exception tag could have been added by middleware. If an exception is later thrown in request pipeline
            // then we don't want to add a duplicate tag here because that breaks some metrics systems.
            tags.TryAddTag("error.type", exception.GetType().FullName);
        }
        else if (HostingTelemetryHelpers.IsErrorStatusCode(statusCode))
        {
            // Add error.type for 5xx status codes when there's no exception.
            tags.TryAddTag("error.type", statusCode.ToString(CultureInfo.InvariantCulture));
        }

        var duration = Stopwatch.GetElapsedTime(startTimestamp, currentTimestamp);
        _requestDuration.Record(duration.TotalSeconds, tags);
    }
}

It's an interesting point to note that while the histogram is strictly about request durations, the presence of the many tags could enable you to derive various other metrics. For example, you could determine the number of "successful" requests, the number of requests to a particular route, or with a given status code.

And that's it, we've covered all of the Insturment types currently available in .NET 10. Note that there's no ObservableHistogram<T> type, as that generally wouldn't be practical to implement.

We now know how to create all the different types of Instrument, and in the first post of this series I showed how to record the metrics using dotnet-counters. In the following post in this series, we'll look at how to record these values in-process instead.

Summary

In this post, I described each of the different Instrument<T> types exposed by the System.Diagnostics.Metrics APIs. For each type I described when you would use it and provided an example of both how to create the Instrument<T>, and how to record values, using examples from the .NET base class libraries and ASP.NET Core. In the next post we'll look at how to record values produced by Instrument<T> types in-process.