← Back to context

Comment by cmrdporcupine

9 months ago

The current popularity of the async stuff has its roots in the classic "c10k" problem. (https://en.wikipedia.org/wiki/C10k_problem)

A perception among some that threads are expensive, especially when "wasted" on blocking I/O. And that using them in that domain "won't scale."

Putting aside that not all of use are building web applications (heterodox here in HN, I know)...

Most people in the real world with real applications will not hit the limits of what is possible and efficient and totally fine with thread-based architectures.

Plus the kernel has gotten more efficient with threads over the years.

Plus hardware has gotten way better, and better at handling concurrent access.

Plus async involves other trade-offs -- running a state machine behind the scenes that's doing the kinds of context switching the kernel & hardware already potentially does for threads, but in user space. If you ever pull up a debugger and step through an async Rust/tokio codebase, you'll get a good sense for what the overhead here we're talking about is.

That overhead is fine if you're sitting there blocking on your database server, or some HTTP socket, or some filesystem.

It's ... probably... not what you want if you're building a game or an operating system or an embedded device of some kind.

An additional problem with async in Rust right now is that it involves bringing in an async runtime, and giving it control over execution of async functions... but various things like thread spawning, channels, async locks, etc. are not standardized, and are specific per runtime. Which in the real world is always tokio.

So some piece of code you bring in in a crate, uses async, now you're having to fire up a tokio runtime. Even though you were potentially not building something that has anything to do with the kinds of things that tokio is targeted for ("scalable" network services.)

So even if you find an async runtime that's optimized in some other domain, etc (like glommio or smol or whatever) -- you're unlikely to even be able to use it with whatever famous upstream crate you want, which will have explicit dependencies into tokio.

> If you ever pull up a debugger and step through an async Rust/tokio codebase, you'll get a good sense for what the overhead here we're talking about is.

So I didn't quite do that, but the overhead was interesting to me anyway, and as I was unable to find existing benchmarks (surely they exist?), I instructed computer to create one for me: https://github.com/eras/RustTokioBenchmark

On this wee laptop the numbers are 532 vs 6381 cpu cycles when sending a message (one way) from one async thread to another (tokio) or one kernel thread to another (std::mpsc), when limited to one CPU. (It's limited to one CPU as rdtscp numbers are not comparable between different CPUs; I suppose pinning both threads to their own CPUs and actually measuring end-to-end delay would solve that, but this is what I have now.)

So this was eye-opening to me, as I expected tokio to be even faster! But still, it's 10x as fast as the thread-based method.. Straight up callback would still be a lot faster, of course, but it will affect the way you structure your code.

Improvements to methodology accepted via pull requests :).

  • I'd want to see perf stats on branch prediction misses and L1 cache evictions alongside that though. CPU cycles on their own aren't enough.

    • It doesn't seem my perf provides metric for L1 cache evictions (per perf list).

      Here's the results for 100000 rounds for taskset 1 perf record -F10000 -e branch-misses -e cache-misses -e cache-references target/release/RustTokioBenchmark (a)sync; perf report --stat though:

      async

          Task 2 min roundtrip time: 532
          [ perf record: Woken up 1 times to write data ]
          [ perf record: Captured and wrote 0,033 MB perf.data (117 samples) ]
      
          ...    
          branch-misses stats:
                    SAMPLE events:         54
          cache-misses stats:
                    SAMPLE events:         27
          cache-references stats:
                    SAMPLE events:         36
      

      sync

          Thread 2 min roundtrip time: 7096
          [ perf record: Woken up 5584 times to write data ]
          [ perf record: Captured and wrote 0,367 MB perf.data (7418 samples) ]
      
          ...
          branch-misses stats:
                    SAMPLE events:       6577
          cache-misses stats:
                    SAMPLE events:        159
          cache-references stats:
                    SAMPLE events:        682

      2 replies →

> Putting aside that not all of use are building web applications

Perfect moment to mention "rouille" which is a very lightweight synchronous web server framework. So even when you decide to build some web application you do not necessarily have to go down the tokio/async route. I have been using it for a while at work and for private projects and it turned out to be pretty eye-opening.

Hit the nail on the head.

Unless you're really dealing with absurd numbers of simultaneous blocking I/O, async has entirely too many drawbacks.

>now you're having to fire up a tokio runtime

I've been developing in (mostly async) Rust professionally for a about a year -- I haven't written much sync rust other than my learning projects and a raytracer I'm working on, but what are the kind of common dependencies that pose this problem? Like wanting to use reqwest or things like that?

  • > Like wanting to use reqwest or things like that?

    Yes. Reqwest cranks up Tokio. The amount of stuff it does for a single web request is rather large. It cranks up a thread pool, does the request, and if there's nothing else going on, shuts down the thread pool after a while. That whole reqwest/hyper/tokio stack is intended to "scale", and it's massive overkill for something that's not making large numbers of requests.

    There's "ureq", if you don't want Tokio client side. Does blocking HTTP/HTTPS requests. Will set up a reusable connection pool if you want one.