← Back to context

Comment by LightMachine

2 years ago

Victor here, thanks for the kind words. Regarding issue #60;

Short answer: this is true, but isn't a fundamental limitation and will be addressed in the future.

Long answer: this was quite a surprise to me back then, as it defies my understanding of "optimal", and was quite disappointing, but I learned a lot since. My current take is that, yes, some terms are quadratic on HVM while they are linear on the GHC, but that isn't due to some "catastrophic incompatibility", but just an inherent limitation of sharing-by-lazy-duplication. That said, as a practical runtime, HVM is not restricted to have only one way to share data. Rust, for example, has several ways: `&`, `&mut`, `Box`, `Rc`, `Arc`, `Cell`, `RefCell`, `Mutex`, `RwLock`, `* const T`, `*mut T`. We will add alternative pointer types to address this issue. That said, first we want to get the pure version completely polished, before moving into these optimizations. Meanwhile, it is worth noting that it is easy to fix these quadratic functions by adding strategic lambdas to avoid unnecessary cloning. This will make them linear again, so a functional language compiling to HVM properly wouldn't have any case where it underperforms GHC asymptotically (and would have several cases where it outperforms it!).

Hi Victor, it's great to hear your ideas on how to tackle the issue. Do you have an example or a sketch of how strategic lambdas can be used to avoid unnecessary cloning? I'd love to hear more.

Indeed, if one were to figure out how to strategically add appropriate mechanisms to share data, perhaps the issue can be solved. It seems a highly nontrivial problem - but perhaps it would have taken someone brave as you to tackle it head on.

I can only hope that such strategic sharing could be solved and formalised - if we only had that together with HVM, effortless parallelism just feels like it could be within reach.

(I wonder if how the call-by-value linear logic translation compares with the call-by-name translation in terms of interaction nets, IIRC it should result equally in an optimal evaluator)

  • Of course! Let me write a complete overview on the subject. Consider the following program:

        (Foo a b) = (If a (F b) (G b))
    

    It is easy to note that Foo makes an unnecessary clone of the 'b' variable, since it is used in different branches. This could be avoided by adding a lambda, as follows:

        (Foo a b) = ((If a λb(F b) λb(G b)) b)
    

    That is what I mean when I said that, in most instances, cloning can be avoided by adding strategic lambdas. Now, in some cases, such as issue #60, this won't be possible. Issue #60 isolates a situation where lazy cloning is necessarily quadratic. Here it is:

        (Gen 0)             = Nil
        (Gen n)             = (Cons n (Gen (- n 1)))
        (Sum Nil)           = 0
        (Sum (Cons x xs))   = (+ x (Sum xs))
        (Tails Nil)         = (Cons Nil Nil)
        (Tails (Cons x xs)) = (Cons xs (Tails xs))
        (Heads Nil)         = Nil
        (Heads (Cons x xs)) = (Cons (Head x) (Heads xs))
        (Head Nil)          = 0
        (Head (Cons x xs))  = x
        Main                = (Sum (Heads (Tails (Gen 1000))))
    

    What this program does: first, it converts a list like [1,2,3,4,5] into a list of where each element stores its tail: [[2,3,4,5],[3,4,5],[4,5],[5]]. Then, it takes the head of each element, resulting in [2,3,4,5]. Notice it is a very inefficient, roundabout implementation of 'tail', which happens to be quadratic on HVM, and linear on GHC. In this specific case, this could be fixed by just writing Tail as usual:

        (Tail (Cons x xs)) = xs
    

    It is worth pointing out that this is the case in all examples where HVM is sub-optimal! That is, there is always an alternative algorithm that avoids unnecessary cloning. That said, of course, in more complex programs, it will be impossible to identify sub-optimal situations and fix automatically (unless supercompilers become practical). As such, it is important for HVM to be able to handle these programs without quadratic slowdowns. To understand how that could be accomplished, let's first note that the culprit of the quadratic slowdown is the way HVM handles a non-linear variable. On the line below:

        (Tails (Cons x xs)) = (Cons xs (Tails xs))
    

    HVM's default behavior is to assume that, by writing `xs` twice, you meant to do a lazy clone. In other words, the program above is compiled as:

        (Tails (Cons x xs)) =
          let [xs_a, xs_b] = xs.lazy_clone()
          (Cons xs_a (Tails xs_b))
    

    In order to work, lazy cloning adds a "fan node" to the program's graph, which uses space. That means that, if you perform a lazy clone and immediately discard it, you'll be left with a "useless clone node" that still occupies memory. On the program shown on issue #60, we're essentially constructing an entire Tail and immediately dropping it with a call to Head, and we do that recursively, so, after enough calls, we have, in memory, the equivalent of:

        let [_, xs0] = xs.lazy_clone()
            [_, xs1] = xs0.lazy_clone()
            [_, xs2] = xs1.lazy_clone()
            [_, xs3] = xs2.lazy_clone()
            ...
            [_, xs{N}] = xs{N-1}.lazy_clone()
        in xsN
    

    Notice how this expression takes a list, clones it several times, discards all the clones, and then immediately returns the original list. These accumulating useless let-expressions (i.e., cloning nodes) are the source of the quadratic slowdown. So, how do we fix it? The solution proposed on TOIFPL book is to add safe pointers, which would allow us to remove redundant clone nodes when safe. But there is a simpler approach: just do NOT make a lazy clone there, to begin with! While lazy cloning is great for things like higher order lambda compositions, in some cases, a good-old pass-by-reference would do a better job. In a future version of the HVM, we could have something like:

        (Tails (Cons x &xs)) = (Cons xs (Tails xs))
    

    Where `&` would mean "pass by immutable reference", skipping lazy cloning entirely, and making this program linear, as it should.

    In short, the culprit for the quadratic slowdown is merely the fact that HVM automatically inserts a ".lazy_clone()" whenever you use a variable more than once, but there is no inherent incompatibility between interaction nets and good-old pass-by-reference, so HVM will have different pointer types that allow a programmer to implement functions like Tails that behave like they would on GHC, without any associated slowdown.

    • Thanks for the detailed breakdown of your analysis!

      The case of erasing nodes after duplication 'unnecessarily' must be handled and cannot be dismissed - being much of the core use case of lazy evaluation.

      From your breakdown it suggests that compilers to HVM use some heuristic which uses `&` immutable references on most data-like nodes, and lazy duplication on code-like nodes. I'm quite sure that still doesn't solve the issue of unnecessary duplications in the general case, but it does seem plausible that such measures suffice to slice the Gordian Knot in practice, similar to how you eliminated the 'oracle' with your superposition resolution - to very practically sidestep the uncommon, complicated case. Indeed, the distinction between code-like nodes and data-like nodes dictates very much of their behaviour during reduction, whereas in much of PLT literature and pure interaction nets, every node is 'code-like'.

      I'm not sure if this is good enough in practice, if as I believe, HVM is meant to be a compiler target; I fear that good heuristics may not be enough, where the 'unnecessary' erase-after-duplicates arise from complex code paths, and may perhaps become apparent intermittently (but catastrophically) or only be clear on inspecting compilation output.

      Looking forward to more of your work!

      1 reply →