← Back to context

Comment by mr90210

5 days ago

> Congrats, rustc forced you to wrap all your types in Arc<Mutex<_>>

Also, don’t people know that a Mutex implies lower throughput depending on how long said Mutex is held?

Lock-free data structures/algorithms are attempt to address the drawbacks of Mutexes.

https://en.wikipedia.org/wiki/Lock_(computer_science)#Disadv...

Lock-free and even wait-free approaches are not a panacea. Memory contention is fundamentally expensive with today’s CPU architectures (they lock, even if you ostensibly don’t). High contention lock-free structures routinely perform worse than serialized locking.

The overhead of Mutex for uncontended cases is negligible. If Mutex acquisition starts to measurably limit your production performance, you have options but will probably need to reconsider the use of shared mutable anyway.

Lock-free data structures and algorithms access shared memory via various atomic operations such as compare-and-swap and atomic arithmetic. The throughout of these operations do not scale with the number of CPU cores. Contrary, the throughput usually reduces with the growing number of CPU cores because they need more time for synchronizing local per-CPU caches with the main memory. So, lock-free data structures and algorithms do not scale on systems with big number of CPU cores. It is preferred to use "shared nothing" data structures and algorithms instead, where every CPU core processes its own portion of state, which isn't shared among other CPU cores. In this case the local state can be processed from local per-CPU caches at the speed which exceeds the main memory read/write bandwidth and has smaller access latency.

  • High write contention on a memory location do not scale with the number of cores (in fact it is bad even with two cores).

    This is independent of using a mutex, a lock free algorithm or message passing (because at the end of the day a queue is a memory location).

Lock-free data structures does not guarantee higher throughput. They guarantee lower latency which often comes at the expense of the throughput. A typical approach for implementing a lock-free data structure is to allow one thread to "take over" the execution of another one by repeating parts of its work. It ensures progress of the system, even if one thread isn't being scheduled. This is mainly useful when you have CPUs competing for work running in parallel.

The performance of high-contention code is a really tricky to reason about and depends on a lot of factors. Just replacing a mutex with a lock-free data structure will not magically speed up your code. Eliminating the contention completely is typically much better in general.