Comment by chii
1 year ago
the reason i suspect tail call optimization is fast would be because the resultant loop is predictable and thus CPU instruction prefetch and memory prefetch works very well.
Jumping via function pointers would probably not be as predictable, and you'd unlikely to see the same benefit.
Of course, one must measure, and i haven't.
The tail call interpreter is also calling through a function pointer. The cost here is purely the call+ret overhead, which can be non-trivial when it is per opcode; on some micro-architectures there is also a limit on taken jumps per cycle (sometimes as low as one taken jump every other cycle).
edit: trampolining would also collapse all indirect jumps to a single source address which is not ideal for branch prediction.