Comment by mananaysiempre

1 year ago

See Mike Pall’s posts on the subject—the performance cost is considerable, for two reasons. First, you’re forcing the compiler to do register allocation for the whole interpreter at once, which it can virtually never do a good job of. (This is actually the more important part.)

Second, given the existence of branch target buffers (and the ruinous cost of mispredicted branches), you really want the instruction dispatch to be a single indirect branch at the end of each instruction implementation, and for that standard tools are somewhere between unhelpful (you can write a macro containing switch (*insn++) { case INSN_FOO: goto impl_foo; /* ... */ } but it’s anybody’s guess whether you’re getting a single jump table for all copies of that) and actively obstructive (“tail merging” in older versions of Clang would actively destroy any attempts at copying dispatch code). Granted, sometimes things work out (new Clang versions can sometimes go “looks like you’re writing an interpreter” and turn a vanilla switch in a loop into duplicated dispatch code). Then again, sometimes they don’t, and you can’t actually know.

0 comments

mananaysiempre

No comments yet

Contribute on Hacker News ↗