← Back to context

Comment by jiggawatts

3 years ago

I whipped up a realistic version of the Fortunes benchmark as would be written by a typical .NET developer. I.e.: Someone writing normal code instead of trying to game a benchmark.

I used the ASP.NET default project template and settings for .NET 7, using Dapper.NET, Microsoft.Data.SqlClient, and SQL Server 2019. I disabled the default layout to match the HTML expected by the Techempower benchmark, but I used a Razor template.

On an 8-core laptop[1] this yields about 33K requests per second, which is a far cry from the ~200K numbers being reported on Techempower.

I suspect I could nearly double that by optimising the test (e.g.: running the test on a separate machine), turning off various background processes, etc... none of which is "gaming" the application code itself. For example, using HTTP instead of HTTPS bumps the numbers up to 40K rps all by itself.

On the other hand, simply enabling Session State causes that to drop by about 4K rps.

I would like to see something akin to Techempower, but for a more realistic app that has JWT-based auth, sessions, multiple database queries, and reasonably complex markup. HTTPS would have to be enabled, compression on, and HTTP request logging enabled. Basically, the framework should be configured in the same way it would be in production.

Query code:

    public async Task OnGetAsync()
    {
        fortunes = (await _connection.QueryAsync<FortuneEntry>("SELECT id, message from Fortune")).ToList();
        fortunes.Add(new FortuneEntry(0, "Additional fortune added at request time."));
        fortunes.Sort( (a,b) => a.message.CompareTo( b.message ));
    }

}

Razor code for the table:

    <table>
    <tr><th>id</th><th>message</th></tr>
    @foreach (var f in Model.fortunes )
    {
        <tr><td>@f.id</td><td>@f.message</td></tr>
    }
    </table>

[1] https://ark.intel.com/content/www/us/en/ark/products/213803/...

> HTTPS would have to be enabled, compression on, and HTTP request logging enabled. Basically, the framework should be configured in the same way it would be in production.

Kestrel can be directly internet-interfacing, but that's not it's wheelhouse. Instead, in a typical high-traffic scenario, it sits behind a reverse proxy like nginx or apache, which in turn handles https, compression, etc.[1]

[1] https://learn.microsoft.com/en-us/aspnet/core/host-and-deplo...

  • The concept of "offloading" TLS compression belongs firmly to the 1990s. As you saw in my benchmark, the percentage difference is small, but the complexity and latency are both lower.

    To correctly handle HTTPS offload, web frameworks have to "pick up" the X-Forwarded-For and X-Forwarded-Proto headers. This needs additional config or code in many frameworks, including ASP.NET Core. I.e.: https://learn.microsoft.com/en-us/aspnet/core/host-and-deplo...

    If you forget, the result is a redirect loop. By "you" I mean a developer working for a company that isn't the one trying to deploy the code behind NGINX. This happens to me every few months, where Random Product(tm) doesn't work properly because it requires HTTPS despite being behind a HTTPS ingress solution.

    No big deal you say, just add the setting and move on? Bzzt... now you've allowed headers to be injected into your applications by random end-users out on the internet, spoofing source IP addresses in your logs, etc...

    So now your web app code must be aware of your authorised reverse proxy servers. This also has to be wired up, managed, and set in a config file somewhere.

    You now also have a new point of failure, a new location that needs performance tuning, scaling, etc...

    Fundamentally, a web server ought to be able to stream static content from memory cache about as fast as the wire can handle it. In which case every "hop" you add also has to have the same throughput! If your web server farm scales to 10 servers of 1 Gbps each, then your reverse proxy must scale to 10 Gbps, or equivalent.

    For 'n' layers, the usable fraction of your total available bandwidth drops to 1/n!

    Take a typical cloud-hosted Kubernetes solution with a web front end talking to an API tier (god help me I've seen too many of these!), and you could end up with 10+ layers, for 10% efficiency. E.g.:

    Cloud load balancer -> Kubernetes Ingress -> Kubernetes Service -> Kubernetes NAT -> NGINX pod -> ... 3x ...

    If you've ever wondered why modern apps "feel slow" despite theoretically great throughput... now you know.

I'm curious what this would be like without Dapper. Dapper's performance was awful compared to rolling our own mapping that generated a static collection of delegates. This was quite a while ago with .net Framework though.

  • As far as I know, Dapper performance is within a few percent of "hand rolled" data reader code, because it dynamically compiles the query reader code and then caches the delegate.

    Update: a quick test with hand-rolled async query code got me to 41.5K rps, up 3.8% from 40K with Dapper, which doesn't seem worth it. Using non-async produces 43.2K rps for 8% higher perf. That goes to show that using async doesn't necessarily improve performance even under high load!

    Entity Framework used to have atrocious performance, but it has been rewritten to work more like Dapper and is quite fast now.

    The general issue is that any form of database query in .NET will produce a lot of garbage on the heap. Reading a bunch of "string" values out of a database will... allocate a bunch of strings.

    It would be theoretically possible to convert this kind of code to use "ref" types, "span<char>", etc... but then the Razor templating engine wouldn't be able to use this. Similarly, it would be pointless if the output is something like JSON, produced by serializing an object graph. Garbage, garbage, and more garbage for the GC to clean up.

    This is why Techempower is so unrealistic. Real developers never write code with "hand rolled delegates", and they shouldn't. They should be writing clear, concise, high-level code using ORMs and template languages like Razor. Not emitting byte arrays to shave nanoseconds off.

    Ideally, the languages and frameworks should let the developers to have their cake and eat it too. Rust for example generally allows high level code to be written while minimising the amount of heap usage.