Comment by dataflow
4 years ago
> This isn't true, this form of conditionals can be compiled into cmov type of instructions, which is faster than regular jump if condition.
IIRC cmov is actually quite slow. It's just faster than an unpredictable branch. Most branches have predictability so you generally don't want a cmov.
Speaking of which, a couple questions regarding this for anyone who might know:
1. Can you disable cmov on x64 on any compiler? How?
2. Why is cmov so slow? Does it kill register renaming or something like that?
cmov itself isn't slow, it has a latency of 2 cycles on Intel; and only 1 cycle on AMD (same speed as an add). However, cmov has to wait until all three inputs (condition flag, old value of target register, value of source register) are available, even though one of those inputs ends up going unused.
A correctly predicted branch allows the subsequent computation (using of the result of the ?: operator) to start speculatively after waiting only for the relevant input value, without having to wait for the condition or the value on the unused branch. This could sometimes save hundreds of cycles if an unused input is slow due to a cache miss.
I wonder if there's anyone on earth who needs nicely formatted human readable file sizes that's worried about the difference between one or two cpu cycle branching instructions?
There might be a few guys at FAANG who have a planet-scale use case for human readable file sizes. But surely "performance optimising" this is _purely_ code golf geekiness?
(Which is a perfectly valid reason to do it, but I'm gonna choose the most obvious to the next progerammer reading it version over one that 50% or 500% or 5000% fast in almost any use case I can think I'm like to need this... I mean, it's only looking for 6 prefixes "KMGTPE" a six line case statement would work for most people?)
Actually, I just realised. This is (probably a small part of) why "calculate all sizes" in Mac finder windows is so slow. I already mentioned Apple in FAANG, but I guess someone at Microsoft and people who work on Linux file brokers care too. And whoever maintains the -h flag codepaths in all the Unix-like utils that support it?
1 reply →
Ahh, thank you! Makes sense.
CMOV is slow because x86 processors will not speculate past a CMOV instruction. They do speculate past conditional jumps, so those are more performant.
This same property makes CMOV useful in Spectre mitigation, see https://llvm.org/docs/SpeculativeLoadHardening.html
Keeping CMOV slow is now an important security feature.
They don't speculate past a CMOV at all? Like even if the next instruction has nothing to do with the CMOV's output?
I think out of order processing is considered different than speculative execution, but I could be remembering my architecture class wrong
3 replies →
This email thread from Linus might be interesting: https://yarchive.net/comp/linux/cmov.html
My understanding of out-of-order (and pipelined) CPUs is limited, but it’s interesting that CMOV isn’t interpreted as a “Jcc over MOV” by the decoder. That would allow using the branch predictor. Would it be too complex or does the microarchitecture not even allow it?
I think that thread is where I first learned this actually. Didn't remember it until you linked it now, thanks for posting it!