← Back to context

Comment by skavi

7 hours ago

My opinion is tokio’s Mutex [0] is too fair.

It passes ownership directly to the next future in the lock queue. If it was instead more similar to a futex [1], this problem could have been avoided.

My assumption is that tokio went with this design because simple Future subexecutors [2] tend to have very poor scheduling. Often they poll each of their child futures in turn regardless of which were actually woken. With an async locks closer in design to a futex, this could lead to subexecutor child futures being starved out.

If that was truly tokio’s reasoning for the design of their Mutex, I still kinda disagree with the choice; it shouldn’t be the lock’s job to fix tokio::select! being bad at scheduling.

[0]: We should be specific that we’re discussing tokio’s Mutex. this is one particular implementation of async locks.

[1]: wake next in queue, but don’t pass ownership. the woken task must CAS to actually acquire.

[2]: think tokio::select! or futures_lite::future::Or. but not FuturesUnordered which does child wakeups properly.

> It passes ownership directly to the next future in the lock queue. If it was instead more similar to a futex [1], this problem could have been avoided.

...sometimes? Wouldn't we still have the problem if the future runs (actually getting the lock) but then awaits again while holding it? I think that's common—if you're not awaiting while holding the lock, then why didn't you just use a simple std::sync::Mutex?

  • futurelock [0] was special specifically because of this aspect where even a future which seemingly acquires and releases a lock in a single poll triggers a deadlock.

    what you describe is just a standard async deadlock. much easier to spot when debugging. and one can reason about those deadlocks in pretty much the same way one would reason about deadlocks between threads.

    [0]: as named and described in https://rfd.shared.oxide.computer/rfd/0609

    • > futurelock [0] was special specifically because of this aspect where even a future which seemingly acquires and releases a lock in a single poll triggers a deadlock.

      Their description at the top doesn't seem to match that:

      RFD> This RFD describes futurelock: a type of deadlock where a resource owned by Future A is required for another Future B to proceed, while the Task responsible for both Futures is no longer polling A. Futurelock is a particularly subtle risk in writing asynchronous Rust.

      ...and further on they describe lock acquisition as an example of the resource:

      RFD> future F1 is blocked on future F2 in some way (e.g., acquiring a shared Mutex)

      ...so I think they meant it to be more general.

      > what you describe is just a standard async deadlock. much easier to spot when debugging. and one can reason about those deadlocks in pretty much the same way one would reason about deadlocks between threads.

      I think the not-being-polled aspect of it is a bit more subtle than between threads. More like thread vs signal/interrupt handler actually, except it's not as well-known that "branch taken after a `select!`" or "place where two futures exist and `join!`/`spawn` isn't being used" is such a special case for scheduling.

      ...and anyway, with a mutex that has an actual reason to be async, how can you have only a acquire bug but not also have a potential mid-holding bug? You can say the latter is a different class of bug so you've solved futurelock, but you still have a bug any time you would have had futurelock, so semantics 1 working program 0.

      1 reply →