Comment by loeg

4 hours ago

There are real practical implications of both the producer and consumer mutating the same cache line to take a lock that is fundamentally avoided by this "lock-free" design. It isn't meaningless.

That only explains the last stage. In order to steelman the mutex alternative, everything before "further optimization" should have used 2 critical sections. That would give a realistic baseline.