← Back to context

Comment by valyala

5 days ago

Lock-free data structures and algorithms access shared memory via various atomic operations such as compare-and-swap and atomic arithmetic. The throughout of these operations do not scale with the number of CPU cores. Contrary, the throughput usually reduces with the growing number of CPU cores because they need more time for synchronizing local per-CPU caches with the main memory. So, lock-free data structures and algorithms do not scale on systems with big number of CPU cores. It is preferred to use "shared nothing" data structures and algorithms instead, where every CPU core processes its own portion of state, which isn't shared among other CPU cores. In this case the local state can be processed from local per-CPU caches at the speed which exceeds the main memory read/write bandwidth and has smaller access latency.

High write contention on a memory location do not scale with the number of cores (in fact it is bad even with two cores).

This is independent of using a mutex, a lock free algorithm or message passing (because at the end of the day a queue is a memory location).