Comment by haberman

5 years ago

Tail calls are a very common optimization, both Clang and GCC have been performing this optimization successfully for a while. What is new is getting a guarantee that applies to all build modes, including non-optimized builds.

14 comments

haberman

tux1968 5 years ago

If you're interested in this optimization for performance reasons, why would you want an otherwise non-optimized build? It seems that the only important case is the optimized build... where for some reason you're not getting this optimization without explicitly asking for it.

So the question remains... why is the compiler optimization missing this chance to optimize this tail call without it being explicitly marked for optimization?

haberman 5 years ago
If the optimization is not performed, you blow the stack because your stack usage is now O(n) instead of O(1).
That's why it's important to get the optimization in debug mode.
- tux1968 5 years ago
  
  Are you saying that the achievement outlined in the article is making sure that this optimization is turned on in debug mode?
  Because I thought the entire point of the article is that they have made a marked speed up in production mode, not in debug mode. And I've yet to see (what must be obvious to everyone else) why the compiler is missing this obvious optimization in production mode without the explicit markup.
  And if the compiler is not missing this optimization in production mode, why is the article saying they've sped up their code by explicitly marking it for optimization?
  
  4 replies →
- vitus 5 years ago
  
  What's the argument for building in debug mode vs specifying -g to the compiler to build with debug symbols?
  I've previously encountered bugs that stubbornly refused to reproduce in a non-optimized build.
  One thing that comes to mind is `#ifdef DEBUG`-style macros, although I'm curious if there's anything else I'm missing. Omitted stack frames, e.g. from inlining, perhaps?
  
  1 reply →
joppy 5 years ago
One reason that this is not done by default is that it can make debugging a surprise: since each function does not leave a stack frame, stack traces are much less useful. This is not so bad in the case of a tail-recursive function which calls itself, but if proper tail calls are done between multiple functions, you could have A call B tail call C tail call D, and if D crashes the stack trace is just (D called from A).
I’m sure there are some good ways around this problem, and I would love to have proper tail calls available in most languages.
- ufo 5 years ago
  
  In OP's case, the individual functions are replacing small fragments of code that would have been basic blocks inside a larger function. The tail calls are replacing gotos. Those things also wouldn't have produced a stack trace.
  There are other reasons why this style of programming isn't the default. It requires restructuring the program using tail calls. You commandeer the registers that are normally used for function parameters and force the compiler to use them for your most important variables. But at the same time, it also means that trying to call a normal function will compete with those registers. This optimization technique works better if the inner loop that you're optimizing doesn't have many function calls in the fast path (before you break it down in a network of tail-call functions).
- tux1968 5 years ago
  
  I must be really missing something obvious, since I don't know why so many replies are talking about debugging.
  We're talking about the speed of production code (aren't we?) that already elides a lot of debugging features. The article doesn't say that they've found a feature for faster debugging, they're saying that explicitly marking tail calls makes their production code faster... so I'm still lost why the compiler can't find these optimizations in production builds without explicit markup.
  
  1 reply →
wffurr 5 years ago

If you write this style of code and don't get the optimization, then your test code and debug sessions are dog slow. Much much slower than the old giant switch style parser.
Similarly, if you make a mistake that would cause the optimization not to be applied, you'd rather get a compiler error than suddenly have a 10X performance regression.