Comment by nananana9

10 days ago

> the thread sleeps until ready and the kernel abstracts it away.

Sure, but once you involve the kernel and OS scheduler things get 3 to 4 orders of magnitude slower than what they should be.

The last time I was working on our coroutine/scheduling code creating and joining a thread that exited instantly was ~200us, and creating one of our green threads, scheduling it and waiting for it was ~400ns.

You don't need to wait 10 years for someone else to design yet another absurdly complex async framework, you can roll your own green threads/stackful coroutines in any systems language with 20 lines of ASM.

1. Why can’t we have better green threads implementations with better scheduling models?

2. Unchecked array operations are a lot faster. Manual memory management is a lot faster. Shared memory is a lot faster.

Usually when you see someone reach for sharp and less expressive tools it’s justified by a hot code path. But here we jump immediately to the perf hack?

3. How many simultaneous async operations does your program have?

  • Well, if you offload heavy compute into an async task, then usually it depends strictly on how many concurrent inputs you are given. But even something as “simple” as a performance editor benefits from this if done well - that’s why JS text editors have reasonably acceptable performance whereas Java IDEs always struggled (historically anyway since even Java has adopted green threads).

    • Are you sure Java's UI issues are caused by threading and not just Swing being a glitchy pile of junk?

      For example, if you don't explicitly call the java.awt.Toolkit.sync() method after updating the UI state (which according to the docs "is useful for animation"), Swing will in my experience introduce seemingly random delays and UI lag because it just doesn't bother sending the UI updates to the window system.

      5 replies →

You involve the kernel also when you are doing async io.

In this context the interesting thing to measure would be doing IO in your green threads vs OS threads.

A stronger theoretical performance argument for async io is that you can do batching, ala io_uring, and do fewer protection domain crossings per IO that way.

  • Well yeah of course, using APIs io_uring and grand central dispatch is basically the whole point of all this async stuff in a systems programming language. It’s absurd it hasn’t been mentioned more here.

    OS Threads are for compute parallelism, async with stackless coroutines (ideally) or green threads is for IO parallelism. It’s pretty straight forward.

    And IMO, Zig has show how to do async IO right (the foundational stuff. Other languages could add better syntax for ergonomics.

    • It's not the whole point, there's lots of other (albeit smaller) gains to be had once you have a strong async apparatus.

      The core of your async implementation doesn't have to care about I/O - as long as it has a way to block/schedule fibers, it's easy to implement io_uring/IOCP based I/O on top of that - it's a matter of sticking a single IO poll in your main loop, and when you get a result, schedule the fiber that's waiting for it.

      Another thing you get almost for free is an accurate Sleep(0.3) - your Sleep pushes the current fiber in a global vector with the time to be resumed, and you loop over that vector in your main loop.

      We're writing a game engine so WaitForNextFrame() is another useful one - the implementation is literally pushing the current fiber to a vector and resuming it the next tick.