Comment by dgrunwald

4 years ago

cmov itself isn't slow, it has a latency of 2 cycles on Intel; and only 1 cycle on AMD (same speed as an add). However, cmov has to wait until all three inputs (condition flag, old value of target register, value of source register) are available, even though one of those inputs ends up going unused.

A correctly predicted branch allows the subsequent computation (using of the result of the ?: operator) to start speculatively after waiting only for the relevant input value, without having to wait for the condition or the value on the unused branch. This could sometimes save hundreds of cycles if an unused input is slow due to a cache miss.

I wonder if there's anyone on earth who needs nicely formatted human readable file sizes that's worried about the difference between one or two cpu cycle branching instructions?

There might be a few guys at FAANG who have a planet-scale use case for human readable file sizes. But surely "performance optimising" this is _purely_ code golf geekiness?

(Which is a perfectly valid reason to do it, but I'm gonna choose the most obvious to the next progerammer reading it version over one that 50% or 500% or 5000% fast in almost any use case I can think I'm like to need this... I mean, it's only looking for 6 prefixes "KMGTPE" a six line case statement would work for most people?)

  • Actually, I just realised. This is (probably a small part of) why "calculate all sizes" in Mac finder windows is so slow. I already mentioned Apple in FAANG, but I guess someone at Microsoft and people who work on Linux file brokers care too. And whoever maintains the -h flag codepaths in all the Unix-like utils that support it?

    • Confused what this has to do with calculating file sizes. Time spent computing file sizes is dwarfed by I/O, right?