Comment by littlestymaar

9 months ago

> at the cost of latency and throughput.

Compared to what?

Doing epoll manually?

A reactor has to move the pending task to some type of work queue. The task has to pulled off the work queue. The work queue is oblivious as to the priority of your tasks. Tasks aren't as expensive as context switching, but they aren't free either: e.g. likely to ruin CPU caches. Less code is fewer instructions is less time.

If you care enough, you generally should be able to outdo the reactor and state machines. Whether you should care enough is debatable.

  • The cache thing is a thing I think a lot of people with a more... naive... understanding of machine architecture don't clue into.

    Even just synchronizing on an atomic can thrash branch prediction and L1 caches both, let alone working your way through a task queue and interrupting program flow to do so.

  • So yeah, you're thinking about the comparison between async/await and manual state machines management with epoll. But that's not what most people have in mind when you're saying async/await have performance impact, most of them would immediately think you're talking about the difference with threads.

If I'm not doing slow blocking I/O, I'm not doing epoll anyways.

But the moment somebody drops async into my codebase, yay, now I get to pay the cost.

  • Either you are doing slow IO (in some of your dependency) or you don't have anyone dropping async in your code though…

Threading, probably.

  • Async/await isn't related to threading (although many users and implementations confuse them); it's a way of transforming a function into a suspendable state machine.

    • Games need async/await for two main reasons:

      - coding multi-frame logic in a straightforward way, which is when transforming a function into a suspendable state machine makes sense

      - using more cores because you're CPU-bound, which is literally multithreading

      Both cases can be covered by other approaches, though:

      - submitting multi-frame logic as job parameters to a separate system (e.g., tweening)

      - using data parallelism for CPU-intensive work

  • I don't think so, because there isn't a performance drawback compared to threads when using async. In fact there's literally nothing preventing you from using a thread per task as your future runtime and just blocking on `.await` (and implementing something like that is a common introduction to how async executors run under the hood so it's not particularly convoluted).

    Sure there's no reason to do that, because non-blocking syscalls are just better, but you can…

    • > I don't think so, because there isn't a performance drawback compared to threads when using async.

      There is. When you write async functions, they get split into state machines and units of non-blocking work which need to be added and taken from work queues. None of this has to happen if you just spawn an OS thread and tell it "execute this function". No state machine, no work queue. It's literally just another sequential program that can do blocking I/O independently of your main thread.

      If you insist on implementing a thread-based solution in exactly he same way that an async solution would, then yes they'll both pay the price of the convoluted runtime. The point is, there's no need to do that.