Comment by Karliss

1 year ago

The portable alternative is being explicit with your loops, possibly in combination with gigantic unwieldy switch statements or some regular goto (it is still part of standard). But that comes at the cost of readability and sometimes performance.Whether it's practical depends on the usecase. For something like recursive data structure algorithms which are relatively small and self contained, I would say it's perfectly doable. Simple interpreters - maybe. Complex interpreters - here it becomes messy.

See Mike Pall’s posts on the subject—the performance cost is considerable, for two reasons. First, you’re forcing the compiler to do register allocation for the whole interpreter at once, which it can virtually never do a good job of. (This is actually the more important part.)

Second, given the existence of branch target buffers (and the ruinous cost of mispredicted branches), you really want the instruction dispatch to be a single indirect branch at the end of each instruction implementation, and for that standard tools are somewhere between unhelpful (you can write a macro containing switch (*insn++) { case INSN_FOO: goto impl_foo; /* ... */ } but it’s anybody’s guess whether you’re getting a single jump table for all copies of that) and actively obstructive (“tail merging” in older versions of Clang would actively destroy any attempts at copying dispatch code). Granted, sometimes things work out (new Clang versions can sometimes go “looks like you’re writing an interpreter” and turn a vanilla switch in a loop into duplicated dispatch code). Then again, sometimes they don’t, and you can’t actually know.

A switch-case is the default way to write an interpreter and I'd even argue it's the most readable.

In the context of this article, it's all about the performance. Switch-case generates suboptimal code for the commonly used fast paths of the protobuf parser, because the mere existence of the slow paths is enough to interfere with the performance of the code around them.

Yeah, that's the way virtual machines have been written forever, some version of

    for instruction in instructions {
        switch (instruction) {
            case OPCODE_X: //....
            case OPCODE_Y: //....
            case OPCODE_Z: //....
        }
    }

This is how VMs have been written since the dawn of time (or using computed gotos, another non-standard addition to C). It has problems though, like the fact that the `switch` branch is extremely unpredictable, and that you get a massive function which is hard to optimize. This [[musttail]] trick is a huge improvement. But yeah, if you got to support compilers that don't have [[musttail]], you in essence have to have two implementations, the [[musttail]] one and the loop/switch one.