Comment by Seattle3503

1 day ago

> For example when submitting a write operation, the memory location of those bytes must not be deallocated or overwritten.

> The io-uring crate doesn’t help much with this. The API doesn’t allow the borrow checker to protect you at compile time, and I don’t see it doing any runtime checks either.

I've seen comments like this before[1], and I get the impression that building a a safe async Rust library around io_uring is actually quite difficult. Which is sort of a bummer.

IIRC Alice from the tokio team also suggested there hasn't been much interest in pushing through these difficulties more recently, as the current performance is "good enough".

[1] https://boats.gitlab.io/blog/post/io-uring/

82 comments

Seattle3503

newpavlov 1 day ago

This actually one of my many gripes about Rust async and why I consider it a bad addition to the language in the long term. The fundamental problem is that rust async was developed when epoll was dominant (and almost no one in the Rust circles cared about IOCP) and it has heavily influenced the async design (sometimes indirectly through other languages).

Think about it for a second. Why do we not have this problem with "synchronous" syscalls? When you call `read` you also "pass mutable borrow" of the buffer to the kernel, but it maps well into the Rust ownership/borrow model since the syscall blocks execution of the thread and there are no ways to prevent it in user code. With poll-based async model you side-step this issues since you use the same "sync" syscalls, but which are guaranteed to return without blocking.

For a completion-based IO to work properly with the ownership/borrow model we have to guarantee that the task code will not continue execution until it receives a completion event. You simply can not do it with state machines polled in user code. But the threading model fits here perfectly! If we are to replace threads with "green" threads, user Rust code will look indistinguishable from "synchronous" code. And no, the green threads model can work properly on embedded systems as demonstrated by many RTOSes.

There are several ways of how we could've done it without making the async runtime mandatory for all targets (the main reason why green threads were removed from Rust 1.0). My personal favorite is introduction of separate "async" targets.

Unfortunately, the Rust language developers made a bet on the unproved polling stackless model because of the promised efficiency and we are in the process of finding out whether the bet plays of or not.

duped 1 day ago
> You simply can not do it with state machines polled in user code
That's not really true. The only guarantees in Rust futures are that they are polled() once and must have their Waker's wake() called before they are polled again. A completion based future submits the request on first poll and calls wake() on completion. That's kind of the interesting design of futures in Rust - they support polling and completion.
The real conundrum is that the futures are not really portable across executors. For io_using for example, the executor's event loop is tightly coupled with submission and completion. And due to instability of a few features (async trait, return impl trait in trait, etc) there is not really a standard way to write executor independent async code (you can, some big crates do, but it's not necessarily trivial).
Combine that with the fact that container runtimes disable io_uring by default and most people are deploying async web servers in Docker containers, it's easy to see why development has stalled.
It's also unfair to mischaracterize design goals and ideas from 2016 with how the ecosystem evolved over the last decade, particularly after futures were stabilized before other language items and major executors became popular. If you look at the RFCs and blog posts back then (eg: https://aturon.github.io/tech/2016/09/07/futures-design/) you can see why readiness was chosen over completion, and how completion can be represented with readiness. He even calls out how naïve completion (callbacks) leads to more allocation on future composition and points to where green threads were abandoned.
- newpavlov 1 day ago
  
  No, the fundamental problem (in the context of io-uring) is that futures are managed by user code and can be dropped at any time. This often referred as "cancellation safety". Imagine a future has initialized completion-based IO with buffer which is part of the future state. User code can simply drop the future (e.g. if it was part of `select!`) and now we have a huge problem on our hands: the kernel will write into a dropped buffer! In the synchronous context it's equivalent to de-allocating thread stack under foot of the thread which is blocked on a synchronous syscall. You obviously can do it (using safe code) in thread-based code, but it's fine to do in async.
  This is why you have to use various hacks when using io-uring based executors with Rust async (like using polling mode or ring-owned buffers and additional data copies). It could be "resolved" on the language level with an additional pile of hacks which would implement async Drop, but, in my opinion, it would only further hurt consistency of the language.
  >He even calls out how naïve completion (callbacks) leads to more allocation on future composition and points to where green threads were abandoned.
  I already addressed it in the other comment.
  
  20 replies →
- hobofan 18 hours ago
  
  > The only guarantees in Rust futures are that they are polled() once and must have their Waker's wake() called before they are polled again.
  I just had to double-check as this sounded strange to me, and no that's not true.
  The most efficient design is to do it that way, yes, but there are no guarantees of that sort. If one wants to build a less efficient executor, it's perfectly permissible to just poll futures on a tight loop without involving the Waker at all.
  
  1 reply →
- 0x457 19 hours ago
  
  > And due to instability of a few features (async trait, return impl trait in trait, etc) there is not really a standard way to write executor independent async code (you can, some big crates do, but it's not necessarily trivial).
  Uhm all of that is just sugar on top of stable feature. None of these features or lack off prevent portability.
  Full portability isn't possible specifically due to how Waker works (i.e. is implementation specific). That allows async to work with different style of asyncs. Reason why io_uring is hard in rust is because of io_uring way of dealing with memory.
kibwen 1 day ago
> The fundamental problem is that rust async was developed when epoll was dominant (and almost no one in the Rust circles cared about IOCP)
No, this is a mistaken retelling of history. The Rust developers were not ignorant of IOCP, nor were they zealous about any specific async model. They went looking for a model that fit with Rust's ethos, and completion didn't fit. Aaron Turon has an illuminating post from 2016 explaining their reasoning: https://aturon.github.io/tech/2016/09/07/futures-design/
See the section "Defining futures":
There’s a very standard way to describe futures, which we found in every existing futures implementation we inspected: as a function that subscribes a callback for notification that the future is complete.
Note: In the async I/O world, this kind of interface is sometimes referred to as completion-based, because events are signaled on completion of operations; Windows’s IOCP is based on this model.
[...] Unfortunately, this approach nevertheless forces allocation at almost every point of future composition, and often imposes dynamic dispatch, despite our best efforts to avoid such overhead.
[...] TL;DR, we were unable to make the “standard” future abstraction provide zero-cost composition of futures, and we know of no “standard” implementation that does so.
[...] After much soul-searching, we arrived at a new “demand-driven” definition of futures.
I'm not sure where this meme came from where people seem to think that the Rust devs rejected a completion-based scheme because of some emotional affinity for epoll. They spent a long time thinking about the problem, and came up with a solution that worked best for Rust's goals. The existence of a usable io_uring in 2016 wouldn't have changed the fundamental calculus.
- newpavlov 1 day ago
  
  >which we found in every existing futures implementation we inspected
  This is exactly what I meant when I wrote about the indirect influence from other languages. People may dress it up as much as they want, but it's clear that polling was the most important model at the time (outside of the Windows world) and a lot of design consideration was put into being compatible with it. The Rust async model literally uses the polling terminology in its most fundamental interfaces!
  >this approach nevertheless forces allocation at almost every point of future composition
  This is only true in the narrow world of modeling async execution with futures. Do you see heap allocations in Go on each equivalent of "future composition" (i.e. every function call)? No, you do not. With the stackfull models you allocate a full stack for your task and you model function calls as plain function calls without any future composition shenaniganry.
  Yes, the stackless model is more efficient memory-wise and allows for some additional useful tricks (like sharing future stacks in `join!`). But the stackfull model is perfectly efficient for 95+% of use cases, fits better with the borrow/ownership model, does not result in the `.await` noise, does not lead to the horrible ecosystem split (including split between different executors), and does not need the language-breaking hacks like `Pin` (see the `noalias` exception made for it). And I believe it's possible to close the memory efficiency gap between the models with certain compiler improvements (tracking maximum stack usage bound for functions and introducing a separate async ABI with two separate stacks).
  >The existence of a usable io_uring in 2016 wouldn't have changed the fundamental calculus.
  IIRC the first usable versions of io-uring very released approximately during the time when the Rust async was undergoing stabilization. I am really confident that if the async system was designed today we would've had a totally different model. Importance of completion-based models has only grown since then not only because of the sane async file IO, but also because of Spectre and Meltdown.
  
  9 replies →
- fpoling 12 hours ago
  
  That post explicitly stated one of the goals was to avoid requiring heap allocations. But fundamentally io_uring is incompatible with the stack and in practice coding against it requires dynamic allocations. If that would be known 10 years ago, surely it would have influenced the design goals.
withoutboats3 1 day ago
genuinely so sad to me that you are still grinding this axe. if your fantasy design works so much better - go build it then!
- newpavlov 1 day ago
  
  Deal with it. Async is my greatest disappointment in the otherwise mostly stellar language. And I will continue to argue strongly against it.
  After Rust has raised the level of quality and expectations to such great level, async feels like 3 steps back with all those arguments "you are holding it wrong", footguns, and piles of hacks. And this sentiment is shared by many others. It's really disappointing to see how many resources are getting sunk into the flawed async model by both the language and the ecosystem developers.
  >go build it then
  I did build it and it's in the process of being adopted into a proprietary database (theoretically a prime use-case for async Rust). Sadly, because I don't have ways to change the language and the compiler, it has obvious limitations (and generally it can be called unsound, especially around thread locals). It works for our project only because we have a tightly controlled code base. In future I plan to create a custom "green-thread" fork of `std` to ease limitations a bit. Because of the limitations (and the proprietary nature of the project) it is unlikely to be published as an open source project.
  Amusingly, during online discussions I've seen other unrelated people who done similar stuff.
  
  4 replies →

jcranmer 1 day ago

There is, I think, an ownership model that Rust's borrow checker very poorly supports, and for lack of a better name, I've called it hot potato ownership. The basic idea is that you have a buffer which you can give out as ownership in the expectation that the person you gave it to will (eventually) give it back to you. It's a sort of non-lexical borrowing problem, and I very quickly discovered when trying to implement it myself in purely safe Rust that the "giving the buffer back" is just really gnarly to write.

pornel 1 day ago
This can be done with exclusively owned objects. That's how io_uring abstractions work in Rust – you give your (heap allocated) buffer to a buffer pool, and get it back when the operation is done.
&mut references are exclusive and non-copyable, so the hot potato approach can even be used within their scope.
But the problem in Rust is that threads can unwind/exit at any time, invalidating buffers living on the stack, and io_uring may use the buffer for longer than the thread lives.
The borrow checker only checks what code is doing, but doesn't have power to alter runtime behavior (it's not a GC after all), so it only can prevent io_uring abstractions from getting any on-stack buffers, but has no power to prevent threads from unwinding to make on-stack buffer safe instead.
- jcranmer 19 hours ago
  
  Yes and no.
  In my case, I have code that essentially looks like this:
  struct Parser { state: ParserState } struct Subparser { state: ParserState } impl Parser { pub fn parse_something(&mut self) -> Subparser { Subparse { state: self.state } // NOTE: doesn't work } } impl Drop for Subparser { fn drop(&mut self) { parser.state = self.state; // NOTE: really doesn't work } }
  Okay, I can make the first line work by changing Parser.state to be an Option<ParserState> instead and using Option::take (or std::mem::replace on a custom enum; going from an &mut T to a T is possible in a number of ways). But how do I give Subparser the ability to give its ParserState back to the original parser? If I could make Subparser take a lifetime and just have a pointer to Parser.state, I wouldn't even bother with half of this setup because I would just reach into the Parser directly, but that's not an option in this case. (The safe Rust option I eventually reached for is a oneshot channel, which is actually a lot of overhead for this case).
  It's the give-back portion of the borrow-to-give-back pattern that ends up being gnarly. I'm actually somewhat disappointed that the Rust ecosystem has in general given up on trying to build up safe pointer abstractions in the ecosystem, like doing use tracking for a pointed-to object. FWIW, a rough C++ implementation of what I would like to do is this:
  template <typename T> class HotPotato { T *data; HotPotato<T> *borrowed_from = nullptr, *given_to = nullptr; public: T *get_data() { // If we've given the data out, we can't use it at the moment. return given_to ? nullptr : data; } std::unique_ptr<HotPotato<T>> borrow() { assert(given_to == nullptr); auto *new_holder = new HotPotato(); new_holder->data = data; new_holder->borrowed_from = this; given_to = new_holder; } ~HotPotato() { if (given_to) { given_to->borrowed_from = borrowed_from; } if (borrowed_from) { borrowed_from->given_to = given_to; } else { delete data; } } };
  
  1 reply →
- alfiedotwtf 1 day ago
  
  In my universe, `let` wouldn’t exist… instead there would only be 3 ways to declare variables:
  1. global my_global_var: GlobalType = … 2. heap my_heap_var: HeapType = … 3. stack my_stack_var: StackType = …
  Global types would need to implement a global trait to ensure mutual exclusion (waves hands).
  So by having the location of allocation in the type itself, we no longer have to do boxing mental gymnastics
  
  6 replies →
stouset 1 day ago
Maybe I’m misunderstanding, but why is that not possible with a
Fn(_: T) -> T
- iknowstuff 1 day ago
  
  It totally is
  https://docs.rs/tokio-uring/latest/tokio_uring/fs/struct.Fil...
- dwattttt 1 day ago
  
  As sibling notes, it is. It's very rarely seen though.
  One place you might see something like it is if an API takes ownership, but returns it on error; you see the error side carry the resource you gave it, so you could try again.
- IshKebab 1 day ago
  
  How is that different to
  Fn(_: &mut T) ?
  
  3 replies →
tayo42 1 day ago
Refcel didn't work? Or rc?
- rfoo 1 day ago
  
  Slapping Rc<T> over something that could be clearly uniquely owned is a sign of very poorly designed lifetime rules / system.
  And yes, for now async Rust is full of unnecessary Arc<T> and is very poorly made.
  
  1 reply →

aliceryhl 1 day ago

> IIRC Alice from the tokio team also suggested there hasn't been much interest in pushing through these difficulties more recently, as the current performance is "good enough".

Well, I think there is interest, but mostly for file IO.

For file IO, the situation is pretty simple. We already have to implement that using spawn_blocking, and spawn_blocking has the exact same buffer challenges as io_uring does, so translating file IO to io_uring is not that tricky.

On the other hand, I don't think tokio::net's existing APIs will support io_uring. Or at least they won't support the buffer-based io_uring APIs; there is no reason they can't register for readiness through io_uring.

johncolanduoni 1 day ago
This covers probably 90% of the usefulness of io_uring for non-niche applications. Its original purpose was doing buffered async file IO without a bunch of caveats that make it effectively useless. The biggest speed up I’ve found with it is ‘stat’ing large sets of files in the VFS cache. It can literally be 50x faster at that, since you can do 1000 files with a single systemcall and the data you need from the kernel is all in memory.
High throughput network usecases that don’t need/want AF_XDP or DPDK can get most of the speedup with ‘sendmmsg/recvmmsg’ and segmentation offload.
- dpeckett 1 day ago
  
  For TCP streams syscall overhead isn't a big issue really, you can easily transfer large chunks of data in each write(). If you have TCP segmentation offload available you'll have no serious issues pushing 100gbit/s. Also if you are sending static content don't forget sendfile().
  UDP is a whole another kettle of fish, get's very complicated to go above 10gbit/s or so. This is a big part of why QUIC really struggles to scale well for fat pipes [1]. sendmmsg/recvmmsg + UDP GRO/GSO will probably get you to ~30gbit/s but beyond that is a real headache. The issue is that UDP is not stream focused so you're making a ton of little writes and the kernel networking stack as of today does a pretty bad job with these workloads.
  FWIW even the fastest QUIC implementations cap out at <10gbit/s today [2].
  Had a good fight writing a ~20gbit userspace UDP VPN recently. Ended up having to bypass the kernels networking stack using AF_XDP [3].
  I'm available for hire btw, if you've got an interesting networking project feel free to reach out.
  1. https://arxiv.org/abs/2310.09423
  2. https://microsoft.github.io/msquic/
  3. https://github.com/apoxy-dev/icx/blob/main/tunnel/tunnel.go
  
  3 replies →

JoshTriplett 1 day ago

I think the right way to build a safe interface around io_uring would be to use ring-owned buffers, ask the ring for a buffer when you want one, and give the buffer back to the ring when initiating a write.

pingiun 1 day ago

This is something that Amos Wenger (fasterthanlime) has worked on: https://github.com/bearcove/loona/blob/main/crates/buffet/RE...
Tuna-Fish 1 day ago
This works perfectly well, and allows using the type system to handle safety. But it also really limits how you handle memory, and makes it impossible to do things like filling out parts of existing objects, so a lot of people are reluctant to take the plunge.
- johncolanduoni 1 day ago
  
  That’s annoying for people writing bespoke low-level networking code, but for a high-level HTTP library it’s a rounding error in the overall complexity on display. I think the bigger barrier for Tokio is that the interplay between having an epoll instance and a io_uring instance on the same pool is problematic and can erase performance gains. If done greenfield you could implement the “normal” APIs with ‘IORING_OP_POLL_ADD’, but not all of the exposed ‘mio’ surface area can work this way - only the oneshot API.

ozgrakkurt 1 day ago

You don’t have to represent everything with borrows. You can just use data structures like Slab to make it cancel safe.

As an example this library I wrote before is cancel safe and doesn’t use lifetimes etc. for it.

https://github.com/steelcake/io2

ozgrakkurt 1 day ago

Just realised my code isn’t cancel safe either. It is invalid if the user just drops a read future and the buffer itself while the operation is in the kernel.
It is just a PITA to get it fully right.
Probably need the buffer to come from the async library so user allocates the buffers using the async library like a sibling comment says.
It is just much easier to not use Rust and say futures should run fully always and can’t be just dropped and make some actual progress. So I’m just doing it in zig now

johncolanduoni 1 day ago

It’s annoying but possible to do this correctly and not have the API be too bad. The “happy path” of a clean success or error is fine if you accept that buffers can’t just be simple &[u8] slices. Cancellation can be handled safely with something like the following API contract:

Have your function signature be async fn read(buffer: &mut Vec<u8>) -> Result<…>’ (you can use something more convenient like ‘&mut BytesMut’ too). If you run the future to completion (success or failure), the argument holds the same buffer passed in, with data filled in appropriately on success. If you cancel/drop the future, the buffer may point at an empty allocation instead (this is usually not an annoying constraint for most IO flows, and footgun potential is low).

The way this works is that your library “takes” the underlying allocation before starting the operation out of the variable, replacing it with the default unallocated ‘Vec<u8>’. Once the buffer is no longer used by the IO system, it puts it back before returning. If you cancel, it manages the buffer in the background to release it when safe and the unallocated buffer is left in the passed variable.

andyferris 1 day ago
It sounds like this would be better modelled by passing ownership of the buffer and expecting it to be returned on the success (ok) case. What you described doesn't seem compatible with what I would call a mutable borrow (mutate the contents of a Vec<u8>).
Or maybe I've misunderstood?
- johncolanduoni 13 hours ago
  
  It is compatible under Rust’s model (I’ve used it to implement safe io_uring interfaces specifically). ‘&mut Vec<u8>’ doesn’t just let you mutate contents or extend the allocation - you can call ‘mem::replace(…)’ and swap the allocation entirely. It’s morally equivalent to passing back and forth, and almost identical in the generated machine code (structure return values look a lot like mutable structure arguments at the register calling convention level). However it’s much less annoying to work with in practice - passing buffers back and forth and then reassigning them to the same variable name results in a lot of semantically irrelevant code to please the ownership model.

touisteur 16 hours ago

I wish I could have been paid to work on SPARK specification around io_uring so that one could have built on it. Or to work on SPARK-to-eBPF (there's already a llvm backend for gnat) and have some form of guarantees at the seams... alas.