Comment by josephg

18 hours ago

Nah. (numbers[i] < 500) is an expression which evaluates to true (1) or false (0). Evaluating this doesn't require a branch. There are instructions on modern CPUs to turn this expression into a number without a conditional jump. (cmp (compare), setle (set if comparison was less than or equal), then add).

> then wouldn't they also optimize the naive piece of code?

Great question. They do sometimes!

In general, the problem for compilers is that its not obvious which method would be better in some given piece of code. Most branches are highly predictable. Like, imagine a for loop which counts to 1000. At the end of the loop body, the code branches to see whether we should stay in the loop, or exit the loop. The first 999 times through the loop we keep going - so 99.9% of the time, the branch ends up taking the same path. Its very predictable! CPU designers optimise heavily for this, via branch prediction logic. Highly predictable branches run fast. (This is also why array bounds checking doesn't really hurt performance at all.)

But the branch predictor really struggles when the condition is unpredictable - ie, when a conditional branch is taken about 50% of the time. As is the case in a sorting algorithm.

The compiler has no idea whether any condition in your code is predictable or not. There are hints you can use, but it often defaults to just doing whatever you ask it to do.

Here's what the compiler actually does with the code you quoted. You can see the extra branch + jump for the second version of the code:

https://c.godbolt.org/z/zv7Tcd49f

I clicked around - for some reason even using __builtin_expect_with_probability, none of the compilers I tried will convert from one version of this code into the other.

If you hoist small_numbers[smlen] = numbers[i] out of the if statement then the compiler will produce nearly the same branchless assembly for both cases (the only difference being compare against 499 followed by setle vs. compare against 500 followed by setl).

Taking a second look I want to say you need to hoist the assignment for the loops to be semantically identical anyways.