Comment by haberman

1 year ago

Thanks for the pointer, I'll have to check this out.

Can you elaborate on how "return goto" is easier to implement? [[musttail]] also ends the lifetime of local objects AFAICS.

One thing I'll mention from a quick scan:

> [The] function called in tail position must have identical type to the callee. This ensures both that the return value does not require any conversion, and also that argument passing space is available and calling convention (if relevant) is maintained.

One complaint I've seen repeatedly about [[musttail]] (which I implemented in Clang) is that this constraint is unnecessarily strict, since some architectures will allow tail calls even for functions that do not perfectly match: https://github.com/llvm/llvm-project/issues/54964

"But then the code would be nonportable." True, but tail call optimization is inherently nonportable, since some targets fundamentally do not support tail call optimization (eg. WASM without the tail call extension).

>[[musttail]] also ends the lifetime of local objects AFAICS.

That's good to know. I had this github issue [0] in the back of my mind, as well as witnessing occasions of clang turning [[musttail]] into inner loops, and concluded clang's implementation must be more sophisticated than simply replacing calls with jumps. Just a little paranoia from trying to be serious with compiler dev[1]: fulfilling a laid-out spec feels more sound versus imitating something out there.

>this constraint is unnecessarily strict

I would agree, at least for x86 psABI, it can be pretty elaborative as long as the return value is the same register and argument stack don't exceed what's provided. Debug/profiling side might hate it, though.

[0] https://github.com/llvm/llvm-project/issues/72555 [1] https://github.com/fuhsnn/slimcc/

  • I certainly understand your caution. I don't have intimate expertise with the implementation of musttail in the backend -- when I implemented musttail in Clang, it was piggybacking on an existing attribute from the LLVM IR: https://llvm.org/docs/LangRef.html#call-instruction

    That said, my rough understanding is that a tail call ends the lifetime of all objects in the old stack frame. It follows that it is UB to access any objects from the previous stack frame after a tail call, and that would include Gerben's first example in https://github.com/llvm/llvm-project/issues/72555

    Your slimcc project looks really interesting, thanks for the pointer.

> some targets fundamentally do not support tail call optimization

Can't any tail call be rewritten as a loop? Couldn't a WASM compiler without the tail call extension implement it this way?

  • Yes, wasm2c implements the Wasm tail-call feature with trampolines, exactly this way. (https://github.com/WebAssembly/wabt/blob/main/test/wasm2c/ta... has an example.)

    Doing it with a trampoline is probably slower than if C really had tail calls. On the other hand, adding "real" tail calls to C would probably require changing the ABI (e.g. to "tailcc" or "fastcc -tailcallopt"), and I think there's some reason to think this would probably impose a penalty everywhere (https://llvm.org/docs/CodeGenerator.html#tail-call-optimizat...).

  • It can't be rewritten as loop due to function pointers. Using JS notation to avoid noise:

    function logAndCall(statement, func) { console.log(statement); return func(); }

    Tail call optimization is actually possible here since we call func in "tail position". It might be unlikely to blow up the stack, but it can definitely happen if you do a lot of continuation passing.

    Perhaps more commonly for C++/Rust, tail call optimization would be enormously valuable to have behind the scenes for destructors. It's actually very difficult to implement a linked list in safe Rust that doesn't explode the stack for large lists since no TCO is done for destroying subobjects. You can still avoid stack overflow, but you have to do things like manually enumerating the list.

    • I don't get the problem. The generated code for your example would return the same function pointer + evaluated args list to the wrapping trampoline (i.e. the code that contains the loop), where that loop expects each thing it invokes to return a sum type NextInvoke(func, args) | Return(value).

      And that's not a special case for passed-in function pointers; that's what a trampoline always does. Trampolines expect static-invoked tail-calls to be codegen'ed into function-pointer references too.

      1 reply →

  • Only with the aforementioned escape analysis. The function call stack frames serve a critical purpose in most nontrivial logic.

  • > Can't any tail call be rewritten as a loop?

    No. In general tail calls cannot be rewritten into loops.

    • More specifically, tail recursion is usually easy to turn into a loop. Tail calls can be difficult to turn into loops when they call a different function, or worse a function passed in as a variable.

      1 reply →