← Back to context

Comment by newpavlov

1 day ago

This actually one of my many gripes about Rust async and why I consider it a bad addition to the language in the long term. The fundamental problem is that rust async was developed when epoll was dominant (and almost no one in the Rust circles cared about IOCP) and it has heavily influenced the async design (sometimes indirectly through other languages).

Think about it for a second. Why do we not have this problem with "synchronous" syscalls? When you call `read` you also "pass mutable borrow" of the buffer to the kernel, but it maps well into the Rust ownership/borrow model since the syscall blocks execution of the thread and there are no ways to prevent it in user code. With poll-based async model you side-step this issues since you use the same "sync" syscalls, but which are guaranteed to return without blocking.

For a completion-based IO to work properly with the ownership/borrow model we have to guarantee that the task code will not continue execution until it receives a completion event. You simply can not do it with state machines polled in user code. But the threading model fits here perfectly! If we are to replace threads with "green" threads, user Rust code will look indistinguishable from "synchronous" code. And no, the green threads model can work properly on embedded systems as demonstrated by many RTOSes.

There are several ways of how we could've done it without making the async runtime mandatory for all targets (the main reason why green threads were removed from Rust 1.0). My personal favorite is introduction of separate "async" targets.

Unfortunately, the Rust language developers made a bet on the unproved polling stackless model because of the promised efficiency and we are in the process of finding out whether the bet plays of or not.

> You simply can not do it with state machines polled in user code

That's not really true. The only guarantees in Rust futures are that they are polled() once and must have their Waker's wake() called before they are polled again. A completion based future submits the request on first poll and calls wake() on completion. That's kind of the interesting design of futures in Rust - they support polling and completion.

The real conundrum is that the futures are not really portable across executors. For io_using for example, the executor's event loop is tightly coupled with submission and completion. And due to instability of a few features (async trait, return impl trait in trait, etc) there is not really a standard way to write executor independent async code (you can, some big crates do, but it's not necessarily trivial).

Combine that with the fact that container runtimes disable io_uring by default and most people are deploying async web servers in Docker containers, it's easy to see why development has stalled.

It's also unfair to mischaracterize design goals and ideas from 2016 with how the ecosystem evolved over the last decade, particularly after futures were stabilized before other language items and major executors became popular. If you look at the RFCs and blog posts back then (eg: https://aturon.github.io/tech/2016/09/07/futures-design/) you can see why readiness was chosen over completion, and how completion can be represented with readiness. He even calls out how naïve completion (callbacks) leads to more allocation on future composition and points to where green threads were abandoned.

  • No, the fundamental problem (in the context of io-uring) is that futures are managed by user code and can be dropped at any time. This often referred as "cancellation safety". Imagine a future has initialized completion-based IO with buffer which is part of the future state. User code can simply drop the future (e.g. if it was part of `select!`) and now we have a huge problem on our hands: the kernel will write into a dropped buffer! In the synchronous context it's equivalent to de-allocating thread stack under foot of the thread which is blocked on a synchronous syscall. You obviously can do it (using safe code) in thread-based code, but it's fine to do in async.

    This is why you have to use various hacks when using io-uring based executors with Rust async (like using polling mode or ring-owned buffers and additional data copies). It could be "resolved" on the language level with an additional pile of hacks which would implement async Drop, but, in my opinion, it would only further hurt consistency of the language.

    >He even calls out how naïve completion (callbacks) leads to more allocation on future composition and points to where green threads were abandoned.

    I already addressed it in the other comment.

    • I really don’t understand this argument. If you force the user to transfer ownership of the buffer into the I/O subsystem, the system can make sure to transfer ownership of the buffer into the async runtime, not leaving it held within the cancellable future and the future returns that buffer which is given back when the completion is received from the kernel. What am I missing?

      10 replies →

    • That problem exists regardless of whether you want to use stackful coroutines or not. The stack could be freed by user code at anytime. It could also panic and drop buffers upon unwinding.

      I wouldn't call async drop a pile of hacks, it's actually something that would be useful in this context.

      And that said there's an easy fix: don't use the pointers supplied by the future!

      9 replies →

  • > The only guarantees in Rust futures are that they are polled() once and must have their Waker's wake() called before they are polled again.

    I just had to double-check as this sounded strange to me, and no that's not true.

    The most efficient design is to do it that way, yes, but there are no guarantees of that sort. If one wants to build a less efficient executor, it's perfectly permissible to just poll futures on a tight loop without involving the Waker at all.

    • Let me rephrase, there's no guarantee that a poll() is called again (because of cancel safety) and in practice you have to call wake() because executors won't reschedule the task unless one of their children wake()s

  • > And due to instability of a few features (async trait, return impl trait in trait, etc) there is not really a standard way to write executor independent async code (you can, some big crates do, but it's not necessarily trivial).

    Uhm all of that is just sugar on top of stable feature. None of these features or lack off prevent portability.

    Full portability isn't possible specifically due to how Waker works (i.e. is implementation specific). That allows async to work with different style of asyncs. Reason why io_uring is hard in rust is because of io_uring way of dealing with memory.

> The fundamental problem is that rust async was developed when epoll was dominant (and almost no one in the Rust circles cared about IOCP)

No, this is a mistaken retelling of history. The Rust developers were not ignorant of IOCP, nor were they zealous about any specific async model. They went looking for a model that fit with Rust's ethos, and completion didn't fit. Aaron Turon has an illuminating post from 2016 explaining their reasoning: https://aturon.github.io/tech/2016/09/07/futures-design/

See the section "Defining futures":

There’s a very standard way to describe futures, which we found in every existing futures implementation we inspected: as a function that subscribes a callback for notification that the future is complete.

Note: In the async I/O world, this kind of interface is sometimes referred to as completion-based, because events are signaled on completion of operations; Windows’s IOCP is based on this model.

[...] Unfortunately, this approach nevertheless forces allocation at almost every point of future composition, and often imposes dynamic dispatch, despite our best efforts to avoid such overhead.

[...] TL;DR, we were unable to make the “standard” future abstraction provide zero-cost composition of futures, and we know of no “standard” implementation that does so.

[...] After much soul-searching, we arrived at a new “demand-driven” definition of futures.

I'm not sure where this meme came from where people seem to think that the Rust devs rejected a completion-based scheme because of some emotional affinity for epoll. They spent a long time thinking about the problem, and came up with a solution that worked best for Rust's goals. The existence of a usable io_uring in 2016 wouldn't have changed the fundamental calculus.

  • >which we found in every existing futures implementation we inspected

    This is exactly what I meant when I wrote about the indirect influence from other languages. People may dress it up as much as they want, but it's clear that polling was the most important model at the time (outside of the Windows world) and a lot of design consideration was put into being compatible with it. The Rust async model literally uses the polling terminology in its most fundamental interfaces!

    >this approach nevertheless forces allocation at almost every point of future composition

    This is only true in the narrow world of modeling async execution with futures. Do you see heap allocations in Go on each equivalent of "future composition" (i.e. every function call)? No, you do not. With the stackfull models you allocate a full stack for your task and you model function calls as plain function calls without any future composition shenaniganry.

    Yes, the stackless model is more efficient memory-wise and allows for some additional useful tricks (like sharing future stacks in `join!`). But the stackfull model is perfectly efficient for 95+% of use cases, fits better with the borrow/ownership model, does not result in the `.await` noise, does not lead to the horrible ecosystem split (including split between different executors), and does not need the language-breaking hacks like `Pin` (see the `noalias` exception made for it). And I believe it's possible to close the memory efficiency gap between the models with certain compiler improvements (tracking maximum stack usage bound for functions and introducing a separate async ABI with two separate stacks).

    >The existence of a usable io_uring in 2016 wouldn't have changed the fundamental calculus.

    IIRC the first usable versions of io-uring very released approximately during the time when the Rust async was undergoing stabilization. I am really confident that if the async system was designed today we would've had a totally different model. Importance of completion-based models has only grown since then not only because of the sane async file IO, but also because of Spectre and Meltdown.

    • > But the stackfull model is

      The existence of advantages doesn't change anything here. The problems is that the disadvantages made this approach a non-starter, despite a lot of effort to make it work. Tradeoffs exist in language design, and the approaches were judged accordingly. What works for Go doesn't necessarily work for Rust, because they target different domains.

      > I am really confident that if the async system was designed today we would've had a totally different model

      No, without solving the original problems, the outcome would be the same. The Rust devs at the time were well aware of io_uring.

      8 replies →

  • That post explicitly stated one of the goals was to avoid requiring heap allocations. But fundamentally io_uring is incompatible with the stack and in practice coding against it requires dynamic allocations. If that would be known 10 years ago, surely it would have influenced the design goals.

genuinely so sad to me that you are still grinding this axe. if your fantasy design works so much better - go build it then!

  • Deal with it. Async is my greatest disappointment in the otherwise mostly stellar language. And I will continue to argue strongly against it.

    After Rust has raised the level of quality and expectations to such great level, async feels like 3 steps back with all those arguments "you are holding it wrong", footguns, and piles of hacks. And this sentiment is shared by many others. It's really disappointing to see how many resources are getting sunk into the flawed async model by both the language and the ecosystem developers.

    >go build it then

    I did build it and it's in the process of being adopted into a proprietary database (theoretically a prime use-case for async Rust). Sadly, because I don't have ways to change the language and the compiler, it has obvious limitations (and generally it can be called unsound, especially around thread locals). It works for our project only because we have a tightly controlled code base. In future I plan to create a custom "green-thread" fork of `std` to ease limitations a bit. Because of the limitations (and the proprietary nature of the project) it is unlikely to be published as an open source project.

    Amusingly, during online discussions I've seen other unrelated people who done similar stuff.

    • > it has obvious limitations (and generally it can be called unsound, especially around thread locals)

      Is this really better than what we have now? I don't think async is perfect, but I can see what tradeoffs they are currently making and how they plan to address most if not all of them. "General" unsoundness seems like a rather large downside.

      > In future I plan to create a custom "green-thread" fork of `std` to ease limitations a bit

      Can you go more in-depth into these limitations and which would be alleviated by having first class support for your approach in the compiler/std?

      2 replies →