Comment by iTokio

1 year ago

The remaining issue with tail calls used to switch context, is that you’re using functions that must use a calling convention. And unfortunately they waste registers to restore state on function exit.

See the luajit remake blog for an exhaustive analysis and alternative using an intermediate compiler https://sillycross.github.io/2022/11/22/2022-11-22/

I’ve seen a few languages over the years drop and reacquire JIT layers. Some of it has to do with programmer skill and lessons learned, but some is also down to CPU generation.

Like everything else in CS, when the cost balance shifts for different kinds of operations, the best algorithm can shift back to something we haven’t used in fifteen, twenty years. It contributes a lot to the faddishness of programming. Just because we are bringing something back doesn’t mean there’s no reason. But we forget the reasons it wasn’t a panacea last time so that’s still a problem.

If your main JIT gets faster or slower, then the cost-benefit for running it changes, so the threshold to trigger it gets adjusted, and now the amount of code that runs in the other tiers shifts, which might make the amortized cost of that tier worse. It’s like balancing a double pendulum.

If you can make a JIT tier fast and dirty enough, you can skip the interpreter entirely. And, from my armchair position, it seems that the cognitive load of bookkeeping tasks between the interpreter and say two JITs is high enough that a few languages have mothballed the interpreter and used a JIT optimized for compile time not output speed.

And I don’t recall what language, but I’m pretty sure at least one team that did this ended up dropping an intermediate compiler as well, because of that balancing act I mentioned above. It was better to double down on two than to try to handle three.

Clang recently got a new calling convention that makes these tail calls much cheaper (avoids the need for the caller to preserve some registers). I can never remember the name - it’s either preserve_all or preserve_none (whose perspective is the preservation from?).

  • preserve_none is the new one. It can be applied to the functions performing tail calls to allow them use of the full register space.

    I even saw an enhancement recently that will make preserve_none allocate arguments in the registers that are usually callee-saved: https://github.com/llvm/llvm-project/pull/88333

    This will make [[musttail]] + preserve_none a winning combination when used together, particularly when making non-tail calls to fallback functions that use a regular calling convention, because all the arguments to [[musttail]] functions can stay pinned in the callee-save registers.

    I'm delighted, because this matches what I originally proposed back in 2021, except I called it "reverse_ccc" instead of "preserve_none": https://github.com/haberman/llvm-project/commit/e8d9c75bb35c...

    preserve_all also exists, and has existed for a while. You could use it on fallback functions to help the tail calling functions avoid spilling. But this always seemed like an unsatisfactory solution to me, because it's too intrusive (and requires too much diligence) to tag a bunch of regular functions as preserve_all. It's much more practical to tag all the core interpreter functions as preserve_none.

    • I see signs that Google itself is using preserve_none internally, since the public protobuf repo has PROTOBUF_CC (Calling Convention) but it is defined as nothing

        #define PROTOBUF_CC
      

      Is there any chance of this getting out into the wild or is it too dangerous for us mortals?

      1 reply →