It is well known that Rust's async model can lead to deadlocks. However, in the futurelock case I have come around to blaming the Async Locks. The issues is that they are not "fair" in that they don't poll the future holding the lock. There may be some other tradeoffs that would happen if the locks were in some way "fairer" but I think they should be explored.
> The issues is that they are not "fair" in that they don't poll the future holding the lock.
How would you change that? A section of code doesn't have access to its own future to call into.
Best I can think is that you can't just call `let guard = mut.lock().await` but instead have to do `mut.do_locked(fut).await`, so that other `do_locked` calls can poll `fut`. I think that would work, but it seems quite awkward. Then again, imho async mutexes are something that should be used quite sparingly, so maybe that's okay.
Disclaimer - I'm not a Tokio dev so what I say may be very wrong. Some definitions:
Future = a structure with a method poll(self: Pin<&mut Self>, ...) -> Poll<Self::Output>; Futures are often composed of other futures and need to poll them.
Tokio task = A top-level future that is driven by the Tokio runtime. These are the only futures that will be run even if not polled.
My understanding is that Tokio async locks have a queue of tasks waiting on lock. When a lock is unlocked, the runtime polls the task at the front of the queue. Futurelock happens when the task locks the lock, then attempts to lock it a second time. This can happen when a sub-future of the top level task already has the lock, then it polls a different future which tries to take the lock.
This situation should be detectable because Tokio tracks which task is holding an async lock. One improvement could be to panic when this deadlock is spotted. This would at least make the issue easier to debug.
But yes, I think you are right in that the async mutex would need to take the future by value if it has the capability of polling it.
It passes ownership directly to the next future in the lock queue. If it was instead more similar to a futex [1], this problem could have been avoided.
My assumption is that tokio went with this design because simple Future subexecutors [2] tend to have very poor scheduling. Often they poll each of their child futures in turn regardless of which were actually woken. With an async locks closer in design to a futex, this could lead to subexecutor child futures being starved out.
If that was truly tokio’s reasoning for the design of their Mutex, I still kinda disagree with the choice; it shouldn’t be the lock’s job to fix tokio::select! being bad at scheduling.
[0]: We should be specific that we’re discussing tokio’s Mutex. this is one particular implementation of async locks.
[1]: wake next in queue, but don’t pass ownership. the woken task must CAS to actually acquire.
[2]: think tokio::select! or futures_lite::future::Or. but not FuturesUnordered which does child wakeups properly.
> It passes ownership directly to the next future in the lock queue. If it was instead more similar to a futex [1], this problem could have been avoided.
...sometimes? Wouldn't we still have the problem if the future runs (actually getting the lock) but then awaits again while holding it? I think that's common—if you're not awaiting while holding the lock, then why didn't you just use a simple std::sync::Mutex?
It is well known that Rust's async model can lead to deadlocks. However, in the futurelock case I have come around to blaming the Async Locks. The issues is that they are not "fair" in that they don't poll the future holding the lock. There may be some other tradeoffs that would happen if the locks were in some way "fairer" but I think they should be explored.
> The issues is that they are not "fair" in that they don't poll the future holding the lock.
How would you change that? A section of code doesn't have access to its own future to call into.
Best I can think is that you can't just call `let guard = mut.lock().await` but instead have to do `mut.do_locked(fut).await`, so that other `do_locked` calls can poll `fut`. I think that would work, but it seems quite awkward. Then again, imho async mutexes are something that should be used quite sparingly, so maybe that's okay.
Disclaimer - I'm not a Tokio dev so what I say may be very wrong. Some definitions:
My understanding is that Tokio async locks have a queue of tasks waiting on lock. When a lock is unlocked, the runtime polls the task at the front of the queue. Futurelock happens when the task locks the lock, then attempts to lock it a second time. This can happen when a sub-future of the top level task already has the lock, then it polls a different future which tries to take the lock.
This situation should be detectable because Tokio tracks which task is holding an async lock. One improvement could be to panic when this deadlock is spotted. This would at least make the issue easier to debug.
But yes, I think you are right in that the async mutex would need to take the future by value if it has the capability of polling it.
2 replies →
My opinion is tokio’s Mutex [0] is too fair.
It passes ownership directly to the next future in the lock queue. If it was instead more similar to a futex [1], this problem could have been avoided.
My assumption is that tokio went with this design because simple Future subexecutors [2] tend to have very poor scheduling. Often they poll each of their child futures in turn regardless of which were actually woken. With an async locks closer in design to a futex, this could lead to subexecutor child futures being starved out.
If that was truly tokio’s reasoning for the design of their Mutex, I still kinda disagree with the choice; it shouldn’t be the lock’s job to fix tokio::select! being bad at scheduling.
[0]: We should be specific that we’re discussing tokio’s Mutex. this is one particular implementation of async locks.
[1]: wake next in queue, but don’t pass ownership. the woken task must CAS to actually acquire.
[2]: think tokio::select! or futures_lite::future::Or. but not FuturesUnordered which does child wakeups properly.
> It passes ownership directly to the next future in the lock queue. If it was instead more similar to a futex [1], this problem could have been avoided.
...sometimes? Wouldn't we still have the problem if the future runs (actually getting the lock) but then awaits again while holding it? I think that's common—if you're not awaiting while holding the lock, then why didn't you just use a simple std::sync::Mutex?
2 replies →