Comment by lifis
5 hours ago
But why doesn't the CPU just lock two cachelines? Seems relatively easy to do in microcode, no? Just sort by physical address with a conditional swap and then run the "lock one cacheline algorithm" twice, no?
Perhaps the issue it that each core has a locked cacheline entry for each other core, but even then given the size of current CPUs doubling it shouldn't be that significant. And one could also add just a single extra entry and then have a global lock but that only locks the ability to lock a second cacheline.
I suspect it's the risk of deadlocks and perhaps they have no easy way to avoid it.
I assume to save on resources, even if your algorithm is not much more taxxing on silicon, maybe the designers at intel and amd just didn't think optimizing split locks was worth it