Comment by mr90210

5 days ago

> Congrats, rustc forced you to wrap all your types in Arc<Mutex<_>>

Also, don’t people know that a Mutex implies lower throughput depending on how long said Mutex is held?

Lock-free data structures/algorithms are attempt to address the drawbacks of Mutexes.

https://en.wikipedia.org/wiki/Lock_(computer_science)#Disadv...

5 comments

mr90210

Lock-free and even wait-free approaches are not a panacea. Memory contention is fundamentally expensive with today’s CPU architectures (they lock, even if you ostensibly don’t). High contention lock-free structures routinely perform worse than serialized locking.

speed_spread 5 days ago

The overhead of Mutex for uncontended cases is negligible. If Mutex acquisition starts to measurably limit your production performance, you have options but will probably need to reconsider the use of shared mutable anyway.

valyala 5 days ago

Lock-free data structures and algorithms access shared memory via various atomic operations such as compare-and-swap and atomic arithmetic. The throughout of these operations do not scale with the number of CPU cores. Contrary, the throughput usually reduces with the growing number of CPU cores because they need more time for synchronizing local per-CPU caches with the main memory. So, lock-free data structures and algorithms do not scale on systems with big number of CPU cores. It is preferred to use "shared nothing" data structures and algorithms instead, where every CPU core processes its own portion of state, which isn't shared among other CPU cores. In this case the local state can be processed from local per-CPU caches at the speed which exceeds the main memory read/write bandwidth and has smaller access latency.

gpderetta 5 days ago

High write contention on a memory location do not scale with the number of cores (in fact it is bad even with two cores).
This is independent of using a mutex, a lock free algorithm or message passing (because at the end of the day a queue is a memory location).

judofyr 5 days ago

Lock-free data structures does not guarantee higher throughput. They guarantee lower latency which often comes at the expense of the throughput. A typical approach for implementing a lock-free data structure is to allow one thread to "take over" the execution of another one by repeating parts of its work. It ensures progress of the system, even if one thread isn't being scheduled. This is mainly useful when you have CPUs competing for work running in parallel.

The performance of high-contention code is a really tricky to reason about and depends on a lot of factors. Just replacing a mutex with a lock-free data structure will not magically speed up your code. Eliminating the contention completely is typically much better in general.