Comment by MobiusHorizons
6 hours ago
In short modern CPUs can’t actually execute control flow very fast, so they usually cheat and memorize where the program is likely to go next. This works great with things like the loop branch, since most loops are pretty long the cpu can start on the next iteration of the loop while it waits for the condition to finish being calculated. You incur one branch misprediction at the end, but the rest of the loop is fast. But with a bytecode vm, the action of the code is entirely dependent on the bytecode instructions, so it’s not predictable unless those instructions themselves are predictable. That’s why the switch statement is slow. Explaining the optimization from computed gotos is more complicated.
As I understand it, the branch predictor stores its predictions using the address of the branch instruction as a key. In the switch implementation the unpredictable branch is the one that jumps to the specific case. The compiler emits an indirect branch for this just like the computed goto instruction. But it only emits one indirect branch instruction, so that branch is always unpredictable. In the computed goto version, there is no loop, each case statement ends in its own indirect branch, so each bytecode implementation is followed by a branch with a unique address. In practice this makes each of them slightly more predictable, because you only have to predict what comes after an inc instruction instead of what comes next after any instruction. Basically there are more branches to hang predictions on.
No comments yet
Contribute on Hacker News ↗