Comment by mbStavola
1 day ago
> it has obvious limitations (and generally it can be called unsound, especially around thread locals)
Is this really better than what we have now? I don't think async is perfect, but I can see what tradeoffs they are currently making and how they plan to address most if not all of them. "General" unsoundness seems like a rather large downside.
> In future I plan to create a custom "green-thread" fork of `std` to ease limitations a bit
Can you go more in-depth into these limitations and which would be alleviated by having first class support for your approach in the compiler/std?
>Is this really better than what we have now?
Depends on the metric you use. Memory-wise it's a bit less efficient (our tasks usually are quite big, so relative overhead is small in our case), runtime-wise it should be on par or slightly ahead. From the source code perspective, in my opinion, it's much better. We don't have the async/await noise everywhere and after development of the `std` fork we will get async in most dependencies as well for "free" (we still would need to inspect the code to see that they do not use blocking `libc` calls for example). I always found it amusing that people use "sync" `log`-based logging in their async projects, we will not have this problem. The approach also allows migration of tasks across cores even if you keep `Rc` across yield points. And of course we do not need to duplicate traits with their async counterparts and Drop implementations with async operations work properly out of the box.
>Can you go more in-depth into these limitations and which would be alleviated by having first class support for your approach in the compiler/std?
The most obvious example is thread locals. Right now we have to ensure that code does not wait on completion while having a thread local reference (we allow migration of tasks across workers/cores by default). We ban use of thread locals in our code and assume that dependencies are unable to yield into our executor. With forked `std` we can replace the `thread_local!` macro with a task-local implementation which would resolve this issue.
Another source of potential unsoundness is reuse of parent task stack for sub-task stacks in our implementation of `select!`/`join!` (we have separate variants which allocate full stacks for sub-tasks which are used for "fat" sub-tasks). Right now we have to provide stack size for sub-tasks manually and check that the value is correct using external tools (we use raw syscalls for interacting with io-uring and forbid external shared library calls inside sub-tasks). This could be resolved with the aforementioned special async ABI and tracking of maximum stack usage bound.
Finally, our implementation may not work out-of-box on Windows (I read that it has protections against messing with stack pointer on which we rely), but it's not a problem for us since we target only modern Linux.
If you use a custom libc and dynamic linker, you can very easily customize thread locals to work the way you want without forking the standard library.