Comment by elchananHaas

9 hours ago

Disclaimer - I'm not a Tokio dev so what I say may be very wrong. Some definitions:

    Future = a structure with a method poll(self: Pin<&mut Self>, ...) -> Poll<Self::Output>; Futures are often composed of other futures and need to poll them. 


    Tokio task = A top-level future that is driven by the Tokio runtime. These are the only futures that will be run even if not polled.

My understanding is that Tokio async locks have a queue of tasks waiting on lock. When a lock is unlocked, the runtime polls the task at the front of the queue. Futurelock happens when the task locks the lock, then attempts to lock it a second time. This can happen when a sub-future of the top level task already has the lock, then it polls a different future which tries to take the lock.

This situation should be detectable because Tokio tracks which task is holding an async lock. One improvement could be to panic when this deadlock is spotted. This would at least make the issue easier to debug.

But yes, I think you are right in that the async mutex would need to take the future by value if it has the capability of polling it.

3 comments

elchananHaas

scottlamb 9 hours ago

> This situation should be detectable because Tokio tracks which task is holding an async lock. One improvement could be to panic when this deadlock is spotted. This would at least make the issue easier to debug.

That'd be a nice improvement! It could give a clear error message instead of hanging.

...but if they actually are polling both futures correctly via `tokio::join!` or similar, wouldn't it also cause an error where otherwise it'd actually work?

elchananHaas 9 hours ago
Oof, I think that you are right. The issue with Futurelock is a failure of liveness, where the Future holding the lock doesn't get polled. tokio::join! would keep it alive and therefore my suggestion would mistakenly panic.
Yeah, the true fix is probably some form of the fabled Linear types/Structured concurrency where you can guarantee liveness properties.
- scottlamb 5 hours ago
  
  On third thought, maybe your detection idea would work. I think you're right that the tokio runtime knows the lock is owned by this task's future A, and that this task's future B is waiting for the same task. So far that's arguably fine (if inefficient to try acquiring the lock twice in parallel from the same task).
  I think it also should know that after future A has been awoken, the next call into the task's outermost future is returning `Poll::Pending` without polling future A, which is the suss part.
  > Yeah, the true fix is probably some form of the fabled Linear types/Structured concurrency where you can guarantee liveness properties.
  Maybe? I really don't know the details well enough to say a linear types thing could guarantee not only that the thing isn't dropped but also that it continues getting polled in a timely way.