← Back to context

Comment by adrian_b

9 hours ago

Incrementing or decrementing a shared counter is done with an atomic instruction, not with a locked critical section.

This has negligible overhead in most cases. For instance, if the shared counter is already in some cache memory the overhead is smaller than a normal non-atomic access to the main memory. The intrinsic overhead of an atomic instruction is typically about the same as that of a simple memory access to data that is stored in the L3 cache memory, e.g. of the order of 10 nanoseconds at most.

Moreover, many memory allocators use separate per-core memory heaps, so they avoid any accesses to shared memory that need atomic instructions or locking, except in the rare occasions when they interact with the operating system.

Atomic operations, especially RMW operations are very expensive, though. Not as expensive as a syscall, of course, but still a lot more expensive than non-atomic ones. Exactly because they break things like caches

  • Not only that, they write back to main memory. There's limited bandwidth between the CPU and main memory and with multithreading you are looking at pretty significantly increasing the amount of data transferred between the CPU and memory.

    This is such a problem that the JVM gives threads their own allocation pools to write to before flushing back to the main heap. All to reduce the number of atomic writes to the pointer tracking memory in the heap.