← Back to context

Comment by irdc

1 year ago

I wonder how fast this would be when using a trampoline, i.e. returning the next function as a function pointer and calling that from an outer loop. That would have the advantage of being portable C.

C is often used as target language for compiler from higher level languages. The Scheme programming language requires all tail calls not to grow the stack. Therefore implementors have explored various techniques including trampolines. I don't have a citation, but you can find the answer in the papers on compiling Scheme to C. If there is no guarantee of TCO in the target language, then the generated programs will be slower.

Incidentally, this is why implementors of (especially) high level languages are annoyed that TCO was removed from the JavaScript specification. There are even solution for having TCO and still have stack inspection.

https://github.com/schemedoc/bibliography/blob/master/page8....

the reason i suspect tail call optimization is fast would be because the resultant loop is predictable and thus CPU instruction prefetch and memory prefetch works very well.

Jumping via function pointers would probably not be as predictable, and you'd unlikely to see the same benefit.

Of course, one must measure, and i haven't.

  • The tail call interpreter is also calling through a function pointer. The cost here is purely the call+ret overhead, which can be non-trivial when it is per opcode; on some micro-architectures there is also a limit on taken jumps per cycle (sometimes as low as one taken jump every other cycle).

    edit: trampolining would also collapse all indirect jumps to a single source address which is not ideal for branch prediction.