Why tail-recursive functions are loops

4 days ago (kmicinski.com)

Practically the day after I learned about tail recursion in CS class, I learned that almost all recursive calls can be translated to iteration, that in many cases the iterative version is easier to scan, is as fast if not faster, and that they can usually handle much much larger inputs than recursion due to avoiding stack overflow.

Tail recursion is meant to fix the latter. But what we mean to happen and what actually happens ain't ever exactly similar.

Tail recursion IME is a bigger foot gun than relying on someone to add a new conditional branch at the end of a block in an iterative algorithm without fucking it up in the process. And iteration responds generally better to Extract Function. And while I can think of counter cases easily enough, in the large iteration is less work and less vigilance. And you cannot scale a project up without the vigilance requirement amortizing basically to 0 per line of code.

  • > Tail recursion IME is a bigger foot gun

    This is true for some languages, but not all.

    E.g. scala has @tailrec annotations, which make it a compile error for the annotated function to not be tail recursive. Clojure doesn't have tail call elimination, but has the `recur` special form for explicit recursion that is guaranteed to not consume any stack space.Rust has reserved the `become` keyword that will eventually guarantee tail call elimination (So pretty similar to Clojure's recur, but I think Rust's version will allow mutual recursion)

    Zig goes the whole hog, and has `@call(modifier, fn, args)`, where `modifier` can be things like compile_time, always_tail, always_inline, never_tail, never_inline, and a bunch other desirable guarantees you might want.

  • In my mind, the biggest advantage to using tail recursion over vanilla loops is the ability to keep using persistent data structures without any (apparent) mutation.

    At least in theory, a tail recursive call will be converted into a dumb jump, so there shouldn't be a performance penalty, but since from your code's perspective you're passing in the stuff for the next iteration, you can keep using the pretty and easy-to-reason-about structures without creating any kind of mutable reference.

    I'm not 100% sold on tail recursion in a broader sense, but at least with Clojure's loop/recur stuff it is kind of cool to be able to keep using persistent data structures across iterations of loops.

    • What makes tail recursion "special" is that there exists a semantically equivalent mutable/iterative implementation to something expressed logically as immutable/recursive. [0]

      Of course, this means that the same implementation could also be directly expressed logically in a way that is mutable/iterative.

      func pow(uint base, uint n): n == 0 ? return 1 : return n * pow(base, n-1)

      is just

      func pow(uint base, uint n): uint res = 1; for(i=0; i<n; i++){ res *= n} return res

      There is no real "advantage" to, or reason to "sell" anybody on tail call recursion if you are able to easily and clearly represent both implementations, IMO. It is just a compiler/runtime optimization, which might make your code more "elegant" at the cost of obfuscating how it actually runs + new footguns from the possibility that code you think should use TCO actually not (because not all immutable + recursive functions can use TCO, only certain kinds, and your runtime may not even implement TCO in all cases where it theoretically should).

      As an aside, in C++ there is something very similar to TCO called copy-elision/return-value-optimization (RVO): [1]. As with TCO it is IMO not something "buy into" or sell yourself on, it is just an optimization you can leverage when structuring your code in a way similar to what the article calls "continuation passing style". And just like TCO, RVO is neat but IMO slightly dangerous because it relies on implicit compiler/runtime optimizations that can be accidentally disabled or made non-applicable as code changes: if someone wanders in and makes small semantic to changes to my code relying on RVO/TCO for performance they could silently break something important.

      [0] EXCEPT in practice all implementation differences/optimizations introduce observable side effects that can otherwise impact program correctness or semantics. For example, a program could (perhaps implicitly) rely on the fact that it errors out due to stack overflow when recursing > X times, and so enabling TCO could cause the program to enter new/undesirable states; or a program could rely on a functin F making X floating point operations taking at least Y cycles in at least Z microseconds, and not function properly when F takes less than Z microseconds after enabling vectorization. This is Hyrum's Law [2].

      [1] https://en.wikipedia.org/wiki/Copy_elision#Return_value_opti...

      [2] https://www.hyrumslaw.com/

      6 replies →

  • > Practically the day after I learned about tail recursion in CS class, I learned that almost all recursive calls can be translated to iteration, that in many cases the iterative version is easier to scan, is as fast if not faster, and that they can usually handle much much larger inputs than recursion due to avoiding stack overflow.

    What do you mean by easier to scan? I find (most) loops [0] hard to read, because they typically involve mutable state.

    When properly implemented, tail calls are as fast as gotos and loops, and don't blow up any stack. (Not all languages are implemented with a stack in any case.)

    However you have one point:

    Most of the time, we don't use recursion directly in our programs even in a language like Haskell or Scheme. Instead we define a 'combinator' that encapsulates the specific, limited recursion scheme that we want, and then use that one. This is very similar to how people generally don't use goto directly in their programs.

    You might be familiar with the combinators 'map', 'filter' and perhaps 'reduce'/'foldr'. You could re-interpret the various common loop types as such recursion combinators that weaker languages have to bake into their language, because they are not strong enough to express them as a user-defined library. And indeed, Haskell has Control.Monad.Loops (https://hackage.haskell.org/package/monad-loops-0.4.3/docs/C...) which gives you exactly the common loop types as a library.

    Some examples of less common combinators are eg 'unfold' or various combinators to traverse trees or graphs.

    [0] The foreach-loop over some sequence is less headache inducing than eg a while-loop.

  • I am in the other camp. I prefer tail recursion and recursion over loops. However: For the simple cases it can and should probably be abstracted away like the racket for loops or my own goof-loop [1].

    I just feel that a recursive calls makes state much more clear, but then again I am no fan of mutation in general. In my old python days I think a good 60% of my bugs were due to me being bad at managing state.

    [1] https://rikspucko.koketteriet.se/bjoli/goof-loop

    • I'm in the same boat, recursion tends to be easier for me to reason about because I'm expressing the problem in terms of some base case that incoming parameters are being reduced to rather than some condition that an iterative approach is working towards.

      1 reply →

  • There are cases where iteration is clearer, and there are cases where recursion is clearer.

    It's well worth being familiar with both - if you learn how to shoehorn both approaches where they aren't ideal, your judgement on avoiding such practices will improve. :)

  • I have my doubts about any CS class/lecture, that teaches, that the "iterative version is easier to scan". Might just be the bias or inexperience of the lecturer. By not I find recursive to be often easier to read than some for loop with its loop header and counter that I need to think about and update in my mind. And then the loop usually in many languages does not even have a return value, because it is not an expression, but a statement. Meeehhh, very meehhh in many languages. Not all, but many.

    I think maybe in languages like Ruby or Smalltalk a loop can be more readable, because of how they structure it as messages to objects, rather than special keywords in the language.

    • > I have my doubts about any CS class/lecture, that teaches, that the "iterative version is easier to scan".

      Well then you’re in luck because that was not an academic statement but a professional opinion - mine.

      You can’t optimize the patterns you can’t see because they’re obscured by accidental complexity. And iteration is a frequent trick I use to surface deeper pattern s.

      1 reply →

  • I must have missed this class. How does one convert a recursive descent parser into an iterative one?

    • Every recursive algorithm can be made into an iterative algorithm if you use an explicit stack instead of the implicit call stack. It's not always cleaner.

      In tail recursive algorithms, there is no stack, it's just a straight loop.

        def foo(state, acc): # if acc is needed
          if base-condition(state): return acc
          return foo(next(state), f(state, acc))
      

      Is simply:

        def foo(state):
          acc = initial
          while not base-condition(state):
            acc = f(state, acc)
            state = next(state)
          return acc
      

      If it's not tail recursive you introduce a stack, for instance a DFS on a binary tree:

        def search(node, val):
          if node is None: return False # empty tree, only need this check once
          stack = [node]
          while stack:
            n = stack.pop()
            if n.val == val: return True
            if n.right: stack.push(n.right)
            if n.left:  stack.push(n.left)
          return False
      

      Note the insertion order is reversed from the recursive calls in a typical DFS. We want left to be searched first and then its children and then we "recur" back to right and deal with its children, so we need to push right into the stack first and then left.

      When you have multiple mutually recursive functions (as is likely the case in a recursive descent parser) then things become more complicated, but it's still feasible.

      12 replies →

    • You can do it by managing your own stack, but I would argue that doing so makes the algorithm LESS easy to scan and has its own sets of problems.

    • Same way as any other algorithm: with an explicit heap-allocated stack. I have a naive parser where I built operator precedence into the grammar as nested productions instead of using an algorithm like shunting yard. An input of ((((((1)))))) blew the stack. Converted it to an explicit stack and it was fine; deep for the call stack was not deep for my own heap-allocated stack. Nasty code, though--I think this serves as one of the counterexamples to the idea that the recursive code gets simpler when turning it into iteration. The original OP was talking more specifically about tail recursion, which a recursive descent parser won't be.

      1 reply →

  • Recursion deals with recursive data structures, and iteration deals with "plain old" data structures.

    When you use one to process the other, you get into trouble. For example, you need to manage a stack to do iteration on your binary tree. Or you need to construct slices to recurse on your arrays.

    It's all about ergonomics.

    • > For example, you need to manage a stack to do iteration on your binary tree

      Recursion around trees can get you into trouble with the stack. Consider:

          def traverse(node):
            do_work(node.payload)
            traverse(node.left)
            traverse(node.right)
      

      The second recursive call to traverse is in tail position and is a candidate for elimination, but the first one isn't and will _always_ eat stack space. This is fine if you know what you're doing, but can bite you in the arse if you're expecting TCO to save you.

      1 reply →

    • What's a plain old data structure?

      Linked lists are recursive, but you can use loops just fine to deal with them.

      Similarly, it depends on what you do with your trees. If you want to go down a single path (eg to find an element in a search tree), loops aren't too bad.

      And arrays don't necessarily ask for loops. Eg implementing quicksort or heap sort with loops is possible (and depending on your language, compiler and target machine might have performance benefits), but I'd hardly call it ergonomical.

      Despite all my criticism you are hinting at a good point though: it's often useful to define your data structures together with combinators that work on them. So eg you want to define various tree traversals together with your tree. (Most people are familiar with the combinators 'map' and 'filter' for lists but that's a very small pool.)

      Whether those combinators are implemented with recursion or loops is an implementation detail that the user of the combinator doesn't need to worry about too much.

    • This is overly simplistic.

      Is a binary tree recursive from the perspective of a type declaration? Yes.

      Is it recursive from the point of view of search? Depends. If you're doing depth-first search, then you need a stack and a recursive algorithm seems natural. If you're doing breadth-first search, then you need a queue and the algorithm is less obviously recursive.

      In either case the program could be expressed recursively or as an explicit loop. When a stack is needed, the underlying hardware always provides automatic stack management so recursion feels natural in that case.

      When a queue is needed it's more of a tossup since hardware and programming languages don't generally provide automatic queue management, so you're going to need to manage that data structure manually anyway. In which case whether you choose to use recursion or not is more of a personal preference.

      6 replies →

  • I'm in your camp, recursive code is hard for the next reader, which might be you, to understand. It's a bit too magic looking for my taste. If you're looping it should look like a loop, if you're pushing onto a stack you should get to see the stack. It's also safe against modifications which might silently break the tail-call nature of it until you blow out the stack later. It also gives you much saner and more debuggable stack traces because all the state is in the current frame.

    • I never quite understood this recursion is hard/magic sentiment. Maybe it's because I started my CS education when it was still taught out of math departments or because it started with actually using programming languages to do algebra. Then again, the FP Kool-Aid is practically the blood in my veins at this point I guess.

      3 replies →

  • > IME is a bigger foot gun than

    Pretty much true for any functional feature. Great in the classroom, less practical in the real world.

    • Trying to be as functional as possible I think has great benefits in the "real world". Functions are very easy to unit test, for one.

      Pure functional is difficult, but breaking out the parts of the program which can be pure functional is generally a smart thing to do

      If I had drive and ability the to make a programming language from scratch, it would be hybrid imperative/functional, and "pure" functions (no side effects, EXCEPT possibly debug logging) would be clearly marked as such.

Several people in this thread are saying that tail recursion is superior to imperative iteration in that it explicitly specifies which variables may change in each iteration (assuming a broadly immutable language).

To the contrary, I'd argue that immutability isn't the only alternative to universal mutability: many newer imperative languages (such as Rust) have various forms of controlled mutability, where one must explicitly declare which variables may be modified by imperative assignments.

IME, controlled mutability captures just about all the benefit of immutable languages, without requiring any special data structures or sufficiently-smart compiler analyses to ensure good performance. I've never really understood the desire for tail-recursive versions of iterative algorithms, except for a prior commitment to functional programming.

  • > I've never really understood the desire for tail-recursive versions of iterative algorithms, except for a prior commitment to functional programming.

    Tail-recursion and iteration are broadly equivalent right? So picking one over the other is a matter of style and capability.

    You can't use iteration if your language isn't capable of it, but no worries, you can use tail recursion.

    Similarly, you can't use tail recursion if your language isn't capable of it, but no worries, you can use iteration, at least for iterative tail recursion. Otoh, there are uses of tail call elimination that aren't recursion (or aren't direct recursion) ... that can get akward if your language can't manage it.

  • But Rust's semantics make it less ergonomic to pass values and especially once you do any mutation, that code path is no longer trivially up for parallelization/concurrency. There one will have to lean into what Rust offers to make it safe again, which brings along more syntax. When one wants to pass values, one needs to implement clone or copy or something, and then explicitly clone or copy or so.

    It is a tradeoff one can make, and it lends itself to high performance but it does come at a cost.

  • Loops and gootos are just a special case of function calls, that needed to be invented because back in the olden days we had no clue how to design language and write compilers.

    I don't understand why someone would want to hold on nostalgically to restrictions we no longer face.

    Controlled mutability is a step up, yes. Btw, even languages like Haskell give you plenty of mutability to play with, if you want to. (Eg MVars and TVars are a thing.)

    You are right that replacing a loop one for one with a tail recursive version doesn't give you much benefit either way. (Though it does make my life easier, because you don't have to contort my brain around keeping track of mutation.)

    However that's a bit of a straw man and misses the broader picture.

    What you'd want to do is define combinators that encapsulate the specific pattern of computation you want to implement. The most commonly known combinators are things like 'map' and 'filter'. But there are plenty more useful ones you can define, like various tree traversals or whatever makes sense for use cases and data structures.

    Whether those combinators are implemented with tail calls or with loops or gotos or whatever is an implementation detail that their users don't need to worry about.

    I know of one major use case where you'd want to use tail recursion directly: state machines and interpreters. See https://en.wikisource.org/wiki/Lambda:_The_Ultimate_GOTO for an example of the former.

I'm a pretty big functional programming nerd and I want to like tail recursion, but I honestly kind of feel like I agree with Guido on it, which is that it kind of breaks the typical stack-trace patterns.

I have kind of grown to prefer Clojure's loop/recur construct, since it gives you something more or less akin to tail recursion but it doesn't pretend to be actually recursive.

  • Clojure does it this way because the JVM stupidly doesn't support tail call optimization.

    It is true that TCO messes up your stack traces. It is also true that loops mess up your stack traces, because they don't even create a stack trace. In many languages that support TCO, TCO can be turned off for debugging and enabled for production, so Guido's characterization of the issue is ridiculous.

    • I know that's the origin of why Clojure does its manual loop/recur thing, but I've grown to kind of prefer its explicitness. There's no ambiguity on whether it will TCO; it simply won't compile if it's not tail-call.

      I don't agree that being able to turn off TCO in debugging is enough to call Guido's characterization "ridiculous". I often have to debug running production systems, and those are (hopefully) not deployed with debug compilations.

      You're not wrong about loops not creating a stack trace; I guess what bothers me (and maybe Guido) is that with loops there's no expectation of a stack trace, while with a function call there is.

    • > It is also true that loops mess up your stack traces

      No, because an indirect tail-call eliminates the calling frame. After, it looks like the caller of that function called a function it did not call. In fact, with tail-calls a whole pile of steps gets telescoped down to nothing[1]. This is not the case with a loop, it doesn't jump to itself indirectly. Loops don't jump into the middle of other loops in other functions. We can still follow the chain of control flow from the start of the main function, through (non-tail) calls, i.e. the stack, through the start of the loop, and then from some backedge in the loop body we're looking at.

      Tail-calls are absolutely harder to debug in general than loops.

      [1] In the limit, if the entire program was transformed into CPS with tail-calls everywhere, the original execution stack is now strewn throughout the heap as a chain of closures.

      1 reply →

    • Although, a correctly shown stack trace for a recursive function is also pretty “messed up” in some sense. Who wants an O(n) stack trace? I guess most tools are not going to display it nicely, haha.

More exciting in my mind than the equivalence between tail recursion and while loops is the equivalence between left folds and for loops. A for loop in its most general shape is something like

    <whatever state the application is in here serves as the starting state>
    foreach (i in range(n)) {
        <the state of the application is updated with each iteration>
    }
    <the desired state of the application has been applied when the loop is done>

An equivalent left fold is invoked like

    <desired state> = foldl (
        <state updating procedure>, 
        <starting state>,
        range(n)
    )

The difference is that the starting state is given explicitly to the loop, and the desired state is returned as a composite value. This makes it easier to troubleshoot than the implicit state transitions, but it's essentially the same still.

Every developer should know how to write a tail-recursive function and understand why it is equivalent to a loop. That said, tail recursion is rarely needed in modern functional programming. For example, why write out a recursive function when a call to `fold` or `map` will do the same thing?

  • I agree with you--that's a topic I will definitely cover in my blog, too. You make a good point: I know some folks who worked at big financial orgs, writing hundreds of thousands of lines of code, and never wrote general-recursive functions (only used simple recursors like foldl).

  • it's not entirely true that it does the same thing: even if it gives the same result. For many programming languages, fold and map can only act on non-lazy data structures, so require O(N) memory for the data that needs to be folded over, while tail-recursive functions usually only use O(1) memory, even stack memory.

    Notable exceptions to the above are python3 with generators, which I believe truly use O(1) memory with map and fold. Haskell has lists that are lazy by default, but if you fold or map over them, it still "forces the thunk" for each element and thus you still end up using O(N) memory.

    • * I don't think you meant to compare forcing each element (as opposed to forcing the input or output data structures)

      * If you did mean it that way, I doubt Python can avoid forcing each element that goes through its generators. (Without, say, thunking up each element manually or using a generator of generators, etc.)

      Here is Haskell using a fold and a map. It does not force the input, the output, or each element:

        main =
      
          let lazyInput = zip ([1..] :: [Int])
                              (repeat (error "boom"))
      
              lazyOutput = foldr (:) [] lazyInput
      
          in print (sum (map fst (take 10_000_000 lazyOutput)))
      
        > 50000005000000
        > 9 MiB total memory in use
        > real 0m0.062s

    • I would hope that most standard libraries are optimized to avoid this sort of waste as much as possible. In F#, for example, I know that the standard `map` and `fold` are implemented imperatively for maximum performance.

    • ...although list fusion often saves you if the thunks were going to be forced eventually anyway.

  • I wouldn't say "rarely", unless you have a whole host of other higher order functions at your disposal for more special cases than map and fold. There are many cases, where you don't want to fold or map over the whole data structure and want to exit early with a result already. Writing tail recursive functions is still very common.

    • > I wouldn't say "rarely", unless you have a whole host of other higher order functions at your disposal for more special cases than map and fold

      That is my point. Modern functional languages do have a host of higher-order functions for exactly this sort of thing. For example, here is F#'s `Seq` module, for working with lazy sequences: https://fsharp.github.io/fsharp-core-docs/reference/fsharp-c...

      > There are many cases, where you don't want to fold or map over the whole data structure and want to exit early with a result already. Writing tail recursive functions is still very common.

      I think this can usually be handled more concisely by combining higher-order functions instead. For example, if you want to fold over a partial data structure, you can use `filter` to select only the elements you want, and then fold over that subset. Or, if you want to exit early from a map, you can use `takeWhile` and only map over what's left.

      Real-world functional programming is usually about combining these built-in tools effectively, rather than writing new tools from scratch.

      1 reply →

  • > For example, why write out a recursive function when a call to `fold` or `map` will do the same thing?

    Yeah this was a big help when I started F#. Basically "if you're using the rec keyword, you're probably missing something" and hell that even goes for a lot of uses of fold, from the beginners perspective.

Tail recursive functions are loops when tail calls are jumps.

Tail calls being jumps is the key insight.

Not all tail calls are recursion!

Students are confused by recursion. Some may be confused by tail calls.

Don't mix two confusing things together when teaching.

(Tail calls get mixed up with recursion because recognizing and treating/optimizing recursive tail calls to create loops an important highlighted category. In a compiler that doesn't optimize tail calls, optimizing self tail calls is an important priority (if that is easier to do than a general treatment). Many common algorithms are expressible in just one function that calls itself, and there are cases in which some or all of those calls can be tail calls; e.g. binary tree traversal.)

The order should probably be: teach recursion first. Then when students have firmly wrapped their heads around it, introduce tail calls. Students will gain the skill of knowing what is a tail call and why it is important, and recognize which calls tail calls. Secondly, the harder skill of taking recursion that uses non-tail calls and restructuring it to make tail calls instead.

In Swift, we use guard clauses a lot. I think that guard can compile well (not just in recursion). This posting helps me to understand why.

Everybody knows that "goto is harmful" but who has noticed that the very same paper says that loops are technically unnecessary?

Loops are merely a special case of recursion. The reason languages have loops is that reifying these cases to language constructs simplifies code generation. The downside is that it muddles the logic of computation.

Loops are a special case of tail recursion which is a special case of recursion [0] which is a special case of calling functions.

However, loops aren't the only use for tail recursion. Another classic example are state machines as described in the 'Lambda: the ultimate goto' paper. See eg https://en.wikisource.org/wiki/Lambda:_The_Ultimate_GOTO

[0] Well, recursion on functions. You can also have eg recursive data types or recursion in mathematics in general. But let's focus on functions.

  • I think it's the other way around. Tail recursion is a special case of loops. All tail recursive functions can be rewritten as a loop but there exist loops that can not be rewritten as a tail recursive function.

Did Python ever get tail recursion? There was a big controversy years ago. Guido didn't like it. But apparently something went in.

Enthusiasm for tail recursion comes mostly from LISP and LISP-adjacent people - those who learned to program from SICP.[1] This is neither good nor bad. Even MIT doesn't use SICP any more, though.

(The classic 20th century "programming should be hard and programmers should suffer" books:

- Structure and Interpretation of Computer Programs, Abelson and Sussman.

- Fundamental Algorithms and Semi-Numerical Algorithms, Knuth.

- Algorithms + Data Structures = Programs, Wirth.

- A discipline of programming, Dijkstra.

These are mostly of historical interest now, but at one time, knowing those defined a real computer scientist. Today, it's more important that your LLM has been trained on them.)

"Love the lambda." - footer of original article.

[1] https://archive.org/details/Sicp.2nd

  • Python still doesn't have tail recursion, and uses a small stack by default.

    I'll note that in modern imperative languages is harder than it looks to figure out if calls are really in tail position, things like exception handling, destructors etc. interfere with this, so even as a SICP fanboy I'll admit it's fair enough that some languages don't want to bother.

    • > and uses a small stack by default.

      It does not, it sets low recursion limit which is a rather different thing.

  • > Did Python ever get tail recursion? There was a big controversy years ago. Guido didn't like it. But apparently something went in.

    No. The only thing that went in is moving Python frames off of the C stack, but since afaik the recursion limit did not change it has essentially no impact on user code unless you play with said recursion limit.

High-profile places where this is sadly not implemented include:

- V8 JavaScript Engine

- Python

Tail recursion is beautifully deep and simple. It (and as a corollary CPS) makes clear what state matters to your loop (and function) and avoids extraneous variables in loops as well as implicit unneeded state in functions. It also makes it easier to show the correctness of your loops. Sure, there are other functional language constructs like fold and map, if your problem is amenable to them. Tail recursion is more general and simpler.

  • I have almost the opposite view: tail calls are great.

    But most of the time you will want to express your program in terms of more restricted combinators, because restriction makes the readers job easier. Easier to understand, and easier to convince yourself that no weird things are happening.

    Basically, you might already agree that restricting mutation is good. This is the same principle.

    So when you see a 'fold' you know even without looking at the callback, that your program will run through the whole list and not exit early. When you see a 'map' you also know that no early exit will happen, but even more you know exactly how the return value will be constructed (whereas for fold that's arbitrary).

    However you are right that 'fold' and 'filter' and 'map', just like 'for' and 'while', are not good as universal building blocks.

    That's why you should define new combinators when you need them. Typically, that's when you define new data structures, eg when you define a tree you will also want to define a 'map' for it, and also various tree traversals and perhaps searches. Less typically, you also want new combinators for new algorithmic ideas, even on old data structures.

    Eg matrix multiplication is an interesting combinator. You provide the meaning of 'addition' and 'multiplication' as call-backs, and depending on your choice you get an algorithm for finding shortest paths or for matching regular expressions etc (in addition to the obvious matrix-multiplication on real numbers, of course). See https://news.ycombinator.com/item?id=43076088 for the former.

    • I learned recursion in Pascal and had a ball implementing innumerable algorithms purely in recursive structures. The code was often more compact and simpler to explain — once you understood induction. I agree that combinators are helpful, but they’re not universal. Anyway, we’re probably violently agreeing.

      1 reply →

For what it’s worth, in F#, tail recursive functions are turned into equivalent loops by the .NET CLR.

I actually like the compromise. I get to write safe functional code while getting all the benefits of a highly optimised iterative operation.

I disagree with the title; loops are tail-recursive functions, but tail-recursive functions are not loops (in the sense that squares are rectangles, but rectangles are not squares).

It is true that every tail recursive function can be converted into a semantically equivalent loop via a transformation like CPS (which the author mentions). However, for mutually tail-recursive functions, this conversion loses control flow information. This is because after the CPS transformation, calls to the other function become calls to a continuation; this call usually must be implemented as an indirect jump. On the other hand, mutually tail-recursive functions can call each other with direct/statically-known jumps.

This loss of information might appear trivial, but in practice it has some important consequences. Classic examples are interpreter loops. It is well-known that computed gotos can result in modest to large speedups for interpreters [1]. The reason why is that computed gotos create an indirect jump per opcode, so a branch predictor can take advantage of correlations between opcodes. For example, looking at Python disassembly, the header of a standard range for loop compiles down to three opcodes: GET_ITER, FOR_ITER, STORE_FAST in sequence [2]. A branch predictor can recognize that the target of the "FOR_ITER" indirect jump will likely be the "STORE_FAST" instruction pointer; it cannot predict this in the naive implementation where jumps for all instructions are "merged" into a single indirect jump / switch at the top of the loop body. In this case, computed goto is effectively equivalent to a CPS transformation whose closures require no storage on the heap.

Suppose, however, we know even more information about the instruction sequence; for example, we know ahead of time that every FOR_ITER opcode will be followed by a STORE_FAST opcode. We could completely replace the indirect jump with a direct jump to the instruction pointer for the STORE_FAST opcode. Because modern branch predictors are very good, this will have about the same performance in practice as the computed goto loop.

However, consider the limiting case where we know the entire instruction sequence beforehand. If we write our interpreter as many mutually tail-recursive functions, with one function for every instruction, an optimizing compiler can replace every indirect call with a direct (tail-recursive) call to the function that implements the next instruction's opcode. With a good enough optimizer / partial evaluator, you can turn an interpreter into a compiler! This is known as the first Futamura projection [3].

To see an example of this in action, I wrote a prototype of a Brainfuck compiler via the Futamura projection; it uses LLVM as a partial evaluator [4]. The main interesting function is `interpret`, which is templated on the program counter / instruction. That is, `interpret` is really a family of mutually tail-recursive functions which statically call each other as described above. For short Brainfuck programs, the LLVM optimizer is able to statically compute the output of the Brainfuck program. (The one in the Godbolt link compiles to a loop, likely because LLVM does not want to unroll the mutual recursion too much.) You can play around with different Brainfuck programs by modifying the `program` string on line 5.

[1] https://eli.thegreenplace.net/2012/07/12/computed-goto-for-e...

[2] https://godbolt.org/z/rdhMvPo36

[3] https://en.wikipedia.org/wiki/Partial_evaluation#Futamura_pr...

[4] https://godbolt.org/z/WY4j931jf

  • The analogy is inverted - tail-recursive functions are a superset of loops (rectangles include squares), not vice versa. Loops can always be expressed as tail-recursive functions, but tail-recursive functions (particularly mutually recursive ones) can express control flow patterns that simple loops cannot without additional machinery.

  • I appreciate your thoughtful criticism of the post, to my eyes everything you are saying is correct.

> I wrote up a detailed blog post about tail call optimization in Elixir/Erlang and its performance. The TLDR; sort of is that none tail call optimized recursive functions (body-recursive) can be faster and more memory efficient than TCO functions. This is something that I never thought before, that TCO is always faster seems to be a common misconception. My example is also not contrived - it’s an implementation of map.

https://pragtob.wordpress.com/2016/06/16/tail-call-optimizat...