← Back to context

Comment by tines

1 year ago

> some targets fundamentally do not support tail call optimization

Can't any tail call be rewritten as a loop? Couldn't a WASM compiler without the tail call extension implement it this way?

Yes, wasm2c implements the Wasm tail-call feature with trampolines, exactly this way. (https://github.com/WebAssembly/wabt/blob/main/test/wasm2c/ta... has an example.)

Doing it with a trampoline is probably slower than if C really had tail calls. On the other hand, adding "real" tail calls to C would probably require changing the ABI (e.g. to "tailcc" or "fastcc -tailcallopt"), and I think there's some reason to think this would probably impose a penalty everywhere (https://llvm.org/docs/CodeGenerator.html#tail-call-optimizat...).

  • > On the other hand, adding "real" tail calls to C would probably require changing the ABI (e.g. to "tailcc" or "fastcc -tailcallopt")

    But [[musttail]] does exactly this while respecting existing calling conventions: https://clang.llvm.org/docs/AttributeReference.html#musttail

    • No -- as discussed upthread, clang's musttail attribute requires the target function to have the same number of arguments as the caller and for each argument to be similar to the corresponding caller argument. That's stricter than the underlying LLVM musttail marker (when targeting the tailcc/swifttailcc calling conventions) and is too restrictive to implement Wasm's tail-call feature (and probably Scheme's, etc.), at least if arguments are getting passed to functions natively.

      It would be nice if the more relaxed rules of the LLVM musttail marker with tailcc could be exposed in clang (and gcc). I think that's basically what "return goto" would do.

It can't be rewritten as loop due to function pointers. Using JS notation to avoid noise:

function logAndCall(statement, func) { console.log(statement); return func(); }

Tail call optimization is actually possible here since we call func in "tail position". It might be unlikely to blow up the stack, but it can definitely happen if you do a lot of continuation passing.

Perhaps more commonly for C++/Rust, tail call optimization would be enormously valuable to have behind the scenes for destructors. It's actually very difficult to implement a linked list in safe Rust that doesn't explode the stack for large lists since no TCO is done for destroying subobjects. You can still avoid stack overflow, but you have to do things like manually enumerating the list.

  • I don't get the problem. The generated code for your example would return the same function pointer + evaluated args list to the wrapping trampoline (i.e. the code that contains the loop), where that loop expects each thing it invokes to return a sum type NextInvoke(func, args) | Return(value).

    And that's not a special case for passed-in function pointers; that's what a trampoline always does. Trampolines expect static-invoked tail-calls to be codegen'ed into function-pointer references too.

    • A trampoline is less efficient theoretically because it implies that you must check a condition and that you can't make an unconditional tail-call. It's also not quite equivalent since it's a non-local compilation technique requiring the caller to do something differently.

      It's completely different than the story with tail recursion which can be essentially reduced to syntactic sugar over a normal loop (just change the parameters to be mutable variables).

Only with the aforementioned escape analysis. The function call stack frames serve a critical purpose in most nontrivial logic.

> Can't any tail call be rewritten as a loop?

No. In general tail calls cannot be rewritten into loops.

  • More specifically, tail recursion is usually easy to turn into a loop. Tail calls can be difficult to turn into loops when they call a different function, or worse a function passed in as a variable.

    • To give the standard example:

      Consider a state machine where each state is a function, and you transition into a different state by tail-calling another function.

      State machines can have arbitrarily complicated graphs, that would be hard to put into a simple loop.

      (However, you can do something like a 'trampoline' to remove the mutual recursion.)