Drawing graphs in GitHub comments with Mermaid diagrams

In this post I show how I managed to draw graphs in GitHub PR comments using Mermaid diagrams!

Background: why draw graphs in GitHub comments?

In our work on the Datadog .NET tracer, we have to be very aware of the performance impact of the tracer on a customer's application. That means:

We need to be able to estimate the "real world" impact of a feature that affects performance
We need to be careful not to accidentally introduce performance regressions
We need to monitor the longer-term performance, so that we don't accidentally a large performance regression through many minor cuts.

We track and monitor these in three many ways:

Micro-benchmarks using BenchmarkDotNet
Throughput tests, which hammer a web application with requests, using Crank
Execution-time tests, which measure the complete time for an app to run, to account for one-off initialization time using timeit.

Obviously we send the results of all these to Datadog, and track the results in dashboards and over time. These are all checked before a release to make sure performance meets our expectations.

However, if you accidentally introduce a performance regression in a PR, it's preferable to get that feedback immediately instead of having to bisect it later on. GitHub PR comments are a great way to do that: they're immediately visible, they don't block merging (performance regressions are sometimes expected and/or acceptable), and they can be posted programmatically from the build pipeline.

Posting comments to GitHub

For some time we have been posting comments on GitHub PRs comparing the code coverage changes:

GitHub PR showing a Code Coverage report

as well as the results of micro-benchmarks, highlighting any statistically significant changes (calculated using the handy Perfolizer library)

GitHub PR benchmark comparison report

We post these results to GitHub directly from our build pipeline (which uses Nuke). The following function uses the GitHub REST API to post the provided markdown as a comment to PR number prNumber, using the provided gitHubToken (which is stored as secret in the build pipeline):

static async Task PostCommentToPullRequest(string gitHubToken, int prNumber, string markdown)
{
    Console.WriteLine("Posting comment to GitHub");

    var httpClient = new HttpClient();
    httpClient.DefaultRequestHeaders.Add("Accept", "application/vnd.github.v3+json");
    httpClient.DefaultRequestHeaders.Add("Authorization", $"token {gitHubToken}");
    httpClient.DefaultRequestHeaders.UserAgent.Add(new(new System.Net.Http.Headers.ProductHeaderValue("nuke-ci-client")));

    var url = $"https://api.github.com/repos/DataDog/dd-trace-dotnet/issues/{prNumber}/comments";
    Console.WriteLine($"Sending request to '{url}'");

    var result = await httpClient.PostAsJsonAsync(url, new { body = markdown });

    if (result.IsSuccessStatusCode)
    {
        Console.WriteLine("Comment posted successfully");
    }
    else
    {
        var response = await result.Content.ReadAsStringAsync();
        Console.WriteLine("Error: " + response);
        result.EnsureSuccessStatusCode();
    }
}

This all works well, and we wanted to do something similar for our throughput tests. But when it came to these tests, we really wanted to show graphs comparing the throughput results between branches. That's what we were used to in Datadog, and we didn't really want to go backwards to tables of numbers; we really wanted to show graphs in the markdown.

The trouble is, generating graphs that could be embedded in the markdown seemed like a pain. I think it would be possible to embed an image by base64 encoding it as an inline data: image, but instead I want for a "simpler" approach: hacking mermaid diagrams.

Mermaid diagrams in GitHub

Github introduced support for Mermaid diagrams last year. To quote the Mermaid project:

Mermaid is a JavaScript-based diagramming and charting tool that uses Markdown-inspired text definitions and a renderer to create and modify complex diagrams

You can embed a mermaid code block into GitHub markdown block, and GitHub will automatically render it as an image using mermaid.js. For example, the following code block:

```mermaid
flowchart LR
    A[Hard] -->|Text| B(Round)
    B --> C{Decision}
    C -->|One| D[Result 1]
    C -->|Two| E[Result 2]
```

Would be rendered as

Mermaid diagnram

Mermaid supports a bunch of different types of diagram:

Sequence diagram
Flow chart
Class diagram
State diagram
Entity relationship diagram
User journey
Pie chart
Gantt chart
Mindmap
Git graph

However, you'll notice that none of these are the bar or line charts that I really wanted. So some imagination was necessary.

Hacking mermaid diagrams to produce bar charts

The revelation I had was with Gantt charts. If you squint, you can sort of see the "bars" we're aiming for in this Gantt chart

```mermaid
gantt
    title A Gantt Diagram
    dateFormat  YYYY-MM-DD
    section Section
    A task           :a1, 2014-01-01, 3d
    
    section Another
    Task in sec      :2014-01-02  , 1d
    another task      : 2d
```

which renders as

An example Gantt chart

It's not perfect though. In order to use this chart there's a few things we need to "fix"

Gantt charts work based in dates (you can see dates along the axis). We need to figure out a way to work with raw numbers (i.e. number of requests in throughput tests).
Similarly the size of the bars is specified in dates. Again, we need to provide dimensionless numbers instead of dates.
We need to "group" results together. For our throughput tests we run multiple scenarios, and we want to compare between multiple branches for a given scenario.

The solution to the first two points is to use the mermaid diagram's support for specifying time using Unix Epoch (seconds or milliseconds). Our throughput tests give results that are roughly "millions of requests" (i.e. 3-6 million requests), These values are small relative to "typical" Unix Epoch values (e.g. 1,676,197,001) but the important point is they're valid.

To use Unix timestamps in the mermaid Gantt chart, we need to specify two settings:

dateFormat X—this specifies that the value we provide for each Gantt element is specified as a Unix Epoch timestamp (s)
axisFormat %s—this is the d3-time-format format stringfor Unix Epoch timestamp so that we display the raw values on the axis.

That's the format issues resolved. The grouping issue is relatively easy using the support for "sections"

Each scenario is placed in its own section.
Each section contains the results for the candidate branch (the PR), the master branch, and some existing "benchmark" branches. These are displayed in the same order in each section, for easy comparison.

Putting it all together, means we can create a chart that looks like the following:

```mermaid
gantt
    title Throughput Linux x64 (Total requests) 
    todayMarker off
    dateFormat  X
    axisFormat %s

    section Baseline
    This PR (3781) (6.170M)   : 0, 6170460
    master (5.970M)   : 0, 5969974
    benchmarks/2.23.0 (6.211M)   : 0, 6211402
    benchmarks/2.9.0 (6.225M)   : 0, 6224510

    section Automatic
    This PR (3781) (4.134M)   : 0, 4134494
    master (4.235M)   : 0, 4235043
    benchmarks/2.23.0 (4.255M)   : 0, 4254809
    benchmarks/2.9.0 (4.389M)   : 0, 4388983

    section Manual
    This PR (3781) (5.199M)   : 0, 5199286
    master (5.335M)   : 0, 5335475
    benchmarks/2.23.0 (5.464M)   : 0, 5464474

    section Manual + Automatic
    This PR (3781) (3.922M)   : 0, 3921636
    master (4.023M)   : 0, 4023108
    benchmarks/2.23.0 (4.095M)   : 0, 4094835
```

Which, when rendered, looks something like this:

The above Gantt chart rendered using mermaid

Each bar is defined using the format

<BranchName> : 0, <Value>

The 0 specifies the start-point of the bar, which obviously we want to all start at the same point. the <Value> is the number of requests, and is used for the "duration" of the element, creating the bars.

The rendering obviously isn't perfect (more on that later) but overall it does a decent enough job of showing a "single value" for the bar.

As a bonus, we can also automatically highlight concerning results, by marking the element as critical:

<BranchName> : crit, 0, <Value>

This prints the bar in red, and makes it easy to spot potentially problematic values at a glance:

Highlighting problematic bars in red

Hacking mermaid diagrams to produce box-plots

As well as the "bar chart" for throughput tests, I also wanted to draw something akin to a box-and-whisker plot for our execution time tests. I experimented with various options but the best I could come up with is shown below:

Rendering 'box plots' using mermaid

This chart shows the 99% confidence interval for the values, based on the mean and standard deviation of the results. The mean value is shown as a diamond below the graph. This is obviously a fair way from showing all the detail that you get in a box-and-whisker but it's the best option I could find 🤷‍♂️

The main difference with these plots from the bar chart are that instead of starting from 0, the elements define the lower and upper bound for the bar using

<BranchName> : <Lower>, <Upper>

The diamond for the "mean" is created using a milestone marker, with an empty name (or . anyway, as you must provide some sort of name):

.   : milestone, <Mean>,

When you put it all together, you end up with something like this (which created the above diagram):

```mermaid
gantt
    title Execution time (ms) FakeDbCommand (.NET Framework 4.6.2) 
    todayMarker off
    dateFormat  X
    axisFormat %
    
    section Baseline
    This PR (3781) - mean (3,324ms)  : 3235, 3412
     .   : milestone, 3324,
    master - mean (3,320ms)  : 3250, 3390
     .   : milestone, 3320,

    section CallTarget+Inlining+NGEN
    This PR (3781) - mean (3,323ms)  : 3253, 3393
     .   : milestone, 3323,
    master - mean (3,327ms)  : 3254, 3399
     .   : milestone, 3327,
```

As with the bar charts, we can mark problematic results in red using crit.

Of course, it's buggy

And now we come to the down sides. Even if you exclude the limited features and options available when hacking gantt charts like this, the results are slightly frustratingly buggy. As far as I can tell, these bugs lie either in the mermaid.js library itself, or in the way GitHub uses it, but I haven't looked into the issues any further than that.

First of all, there's the layout issues. In the above diagrams, the longer section labels overlap the bars. That's a bit annoying, but I couldn't see an easy way of fixing it automatically.

Secondly, the axis displays in annoyingly random units. They're not random, but it sure looks like it, and I couldn't get the tickInterval to work as I wanted (just put ticks on round numbers, please!). This one is probably on me, but I gave up with it.

Now we get to the more problematic issues. First of all: scaling. If you have your monitor to set to anything other than 100%, things go a bit wonky. Notice how the labels for each bar are no longer on the bar? Instead they're shunted all the way to the left, making them pretty useless.

The bar labels are all messed up

Another tiny one is that the dark mode version just looks a bit harder to read IMO:

Dark mode is harder to read

And if you try to use done (in addition to crit) as another way of highlighting graphs, then it gets impossible to read in dark mode!

Still, overall, it's a relatively simple way to get a few graphics into our PR comments with pretty low difficulty!

Summary

In this post I showed how I added graphs to GitHub PR comments by abusing mermaid diagram Gantt charts. I showed two different types of graphs (bar chart and box-plots) that you can create, albeit with a number of limitations. It's certainly not a high-fidelity or brilliant solution, but it gets the job done for the few charts we need to display!

Andrew Lock | .NET Escapades Andrew Lock