Comment by enedil
4 years ago
This isn't true, this form of conditionals can be compiled into cmov type of instructions, which is faster than regular jump if condition.
4 years ago
This isn't true, this form of conditionals can be compiled into cmov type of instructions, which is faster than regular jump if condition.
Both ?: and if-else have cases where they can be compiled into cmov type instructions and where they cannot. Given int max(int a, int b) { if (a > b) return a; else return b; }, a decent compiler for X86 will avoid conditional branches even though ?: wasn't used. Given int f(int x) { return x ? g() : h(); }, avoiding conditional branches is more costly than just using them, even though ?: was used.
> This isn't true, this form of conditionals can be compiled into cmov type of instructions, which is faster than regular jump if condition.
IIRC cmov is actually quite slow. It's just faster than an unpredictable branch. Most branches have predictability so you generally don't want a cmov.
Speaking of which, a couple questions regarding this for anyone who might know:
1. Can you disable cmov on x64 on any compiler? How?
2. Why is cmov so slow? Does it kill register renaming or something like that?
cmov itself isn't slow, it has a latency of 2 cycles on Intel; and only 1 cycle on AMD (same speed as an add). However, cmov has to wait until all three inputs (condition flag, old value of target register, value of source register) are available, even though one of those inputs ends up going unused.
A correctly predicted branch allows the subsequent computation (using of the result of the ?: operator) to start speculatively after waiting only for the relevant input value, without having to wait for the condition or the value on the unused branch. This could sometimes save hundreds of cycles if an unused input is slow due to a cache miss.
I wonder if there's anyone on earth who needs nicely formatted human readable file sizes that's worried about the difference between one or two cpu cycle branching instructions?
There might be a few guys at FAANG who have a planet-scale use case for human readable file sizes. But surely "performance optimising" this is _purely_ code golf geekiness?
(Which is a perfectly valid reason to do it, but I'm gonna choose the most obvious to the next progerammer reading it version over one that 50% or 500% or 5000% fast in almost any use case I can think I'm like to need this... I mean, it's only looking for 6 prefixes "KMGTPE" a six line case statement would work for most people?)
2 replies →
Ahh, thank you! Makes sense.
CMOV is slow because x86 processors will not speculate past a CMOV instruction. They do speculate past conditional jumps, so those are more performant.
This same property makes CMOV useful in Spectre mitigation, see https://llvm.org/docs/SpeculativeLoadHardening.html
Keeping CMOV slow is now an important security feature.
They don't speculate past a CMOV at all? Like even if the next instruction has nothing to do with the CMOV's output?
4 replies →
This email thread from Linus might be interesting: https://yarchive.net/comp/linux/cmov.html
My understanding of out-of-order (and pipelined) CPUs is limited, but it’s interesting that CMOV isn’t interpreted as a “Jcc over MOV” by the decoder. That would allow using the branch predictor. Would it be too complex or does the microarchitecture not even allow it?
I think that thread is where I first learned this actually. Didn't remember it until you linked it now, thanks for posting it!
If the if/else is simple the compiler should be able to optimize that anyway.