Comment by mananaysiempre
1 year ago
As the disassembly in the post demonstrates, the problem with the fallback path (which is not necessarily the error path) is not how fast the call to it is, it's that the mere existence of that call can force the compiler to create a stack frame and spill registers into it for the whole function, including the fast path.
OK, maybe “force” is not the right word—nobody says the compiler has to have a single stack frame structure for all possible execution paths of a function. Nobody even says it has to use the standard ABI for a no-linkage (static or anonymous-namespace) function (that doesn’t have its address taken). But the reality is, all compilers I’ve seen do, including Clang, so we want a way to tell them to not worry about the ABI and avoid wasting time on preserving registers across the call.
Re your nice jump table, sure it does. But if you try running the result under, say, perf report, and your test bytecode doesn’t describe a short loop, you’ll see one of two things: either you had a branch mispredict on each dispatch; or the compiler went “looks like you’re trying to write an interpreter” and moved the indirect jump to the end of each case (I’ve seen Clang do this). And either way the register allocation in the resulting code probably sucks.
> so we want a way to tell them to not worry about the ABI and avoid wasting time on preserving registers across the call
that's what -fvisibility=internal already does, no?
That’s what static could do (if the function’s address is not taken, or given sufficiently powerful dataflow analysis), but C and C++ compilers don’t take advantage of that. Getting that out of -fvisibility=hidden -flto would also be possible, but requires even more nonexistent compiler smarts. (From a quick web search, I can't figure out what internal visibility brings over hidden.)
(Granted, it’s not like this is completely impossible—I seem to remember GHC and MLton can invent custom calling conventions for Haskell and SML respectively. But the popular C or C++ compilers can’t.)
Really? I've always assumed that static gives C/C++ compilers the right to disregard calling conventions altogether. Now that I think about it, this assumption might be unjustified. Was there any strong reason for compilers to keep the calling convention?
2 replies →