← Back to context

Comment by fuhsnn

1 year ago

A C standard proposal exists for tail call [0], in the form of "return goto (expression);".

What I like about it, over standardizing [[musttail]], is that lifetimes of local objects are guaranteed to end. This makes it possible to implement without extensive escape analysis.

[0] https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3266.htm#5...

Thanks for the pointer, I'll have to check this out.

Can you elaborate on how "return goto" is easier to implement? [[musttail]] also ends the lifetime of local objects AFAICS.

One thing I'll mention from a quick scan:

> [The] function called in tail position must have identical type to the callee. This ensures both that the return value does not require any conversion, and also that argument passing space is available and calling convention (if relevant) is maintained.

One complaint I've seen repeatedly about [[musttail]] (which I implemented in Clang) is that this constraint is unnecessarily strict, since some architectures will allow tail calls even for functions that do not perfectly match: https://github.com/llvm/llvm-project/issues/54964

"But then the code would be nonportable." True, but tail call optimization is inherently nonportable, since some targets fundamentally do not support tail call optimization (eg. WASM without the tail call extension).

  • >[[musttail]] also ends the lifetime of local objects AFAICS.

    That's good to know. I had this github issue [0] in the back of my mind, as well as witnessing occasions of clang turning [[musttail]] into inner loops, and concluded clang's implementation must be more sophisticated than simply replacing calls with jumps. Just a little paranoia from trying to be serious with compiler dev[1]: fulfilling a laid-out spec feels more sound versus imitating something out there.

    >this constraint is unnecessarily strict

    I would agree, at least for x86 psABI, it can be pretty elaborative as long as the return value is the same register and argument stack don't exceed what's provided. Debug/profiling side might hate it, though.

    [0] https://github.com/llvm/llvm-project/issues/72555 [1] https://github.com/fuhsnn/slimcc/

    • I certainly understand your caution. I don't have intimate expertise with the implementation of musttail in the backend -- when I implemented musttail in Clang, it was piggybacking on an existing attribute from the LLVM IR: https://llvm.org/docs/LangRef.html#call-instruction

      That said, my rough understanding is that a tail call ends the lifetime of all objects in the old stack frame. It follows that it is UB to access any objects from the previous stack frame after a tail call, and that would include Gerben's first example in https://github.com/llvm/llvm-project/issues/72555

      Your slimcc project looks really interesting, thanks for the pointer.

  • > some targets fundamentally do not support tail call optimization

    Can't any tail call be rewritten as a loop? Couldn't a WASM compiler without the tail call extension implement it this way?

    • Yes, wasm2c implements the Wasm tail-call feature with trampolines, exactly this way. (https://github.com/WebAssembly/wabt/blob/main/test/wasm2c/ta... has an example.)

      Doing it with a trampoline is probably slower than if C really had tail calls. On the other hand, adding "real" tail calls to C would probably require changing the ABI (e.g. to "tailcc" or "fastcc -tailcallopt"), and I think there's some reason to think this would probably impose a penalty everywhere (https://llvm.org/docs/CodeGenerator.html#tail-call-optimizat...).

      2 replies →

    • It can't be rewritten as loop due to function pointers. Using JS notation to avoid noise:

      function logAndCall(statement, func) { console.log(statement); return func(); }

      Tail call optimization is actually possible here since we call func in "tail position". It might be unlikely to blow up the stack, but it can definitely happen if you do a lot of continuation passing.

      Perhaps more commonly for C++/Rust, tail call optimization would be enormously valuable to have behind the scenes for destructors. It's actually very difficult to implement a linked list in safe Rust that doesn't explode the stack for large lists since no TCO is done for destroying subobjects. You can still avoid stack overflow, but you have to do things like manually enumerating the list.

      3 replies →

    • Only with the aforementioned escape analysis. The function call stack frames serve a critical purpose in most nontrivial logic.

As an aside, I’m excited - but also with lots of trepidation - about the new energy in adding new functionality to C. Excited because there are changes and additions that are crying to be made, even clarifying ideas… trepidation because C++’s gung ho update cadence has somehow ended up in wart patched over wart. Especially when feature reacts with feature in an unfelicitous way far earlier than anticipated. I hope the standards process finds a way to be very conservative, really thoroughly test features in large and diverse codebases, rather than just relying on rationales alone, when choosing to include feature.

  • What new energy? We have had C89, C99, C11, C17, C23

    And yet, no fix in sight for proper strings, arrays, or at very least bounds checked slices, like Dennis Ritchie originally proposed to ANSI/ISO.