Comment by qazxcvbnm

2 years ago

The HVM project is incredibly interesting and I applaud the great work Victor Taelin has put in practically implementing Interaction Nets and popularising Linear Logic.

I've noticed HVM since February last year and was quite convinced that they were really up to something; however now that I've gone through a pile of papers on Linear Logic and the book which HVM is based on (The Optimal Implementation of Functional Programming Languages, Asperti and Guerrini, https://www.amazon.co.jp/-/en/Andrea-Asperti/dp/0521621127), I believe that among other things, the issue presented here https://github.com/HigherOrderCO/HVM/issues/60 is really a fundamental blocker to serving as a general purpose runtime, which it appears HVM attempts at.

Fundamentally HVM is an impressively optimised implementation of Symmetric Interaction Combinators, a variety of Interaction Net. Symmetric Interaction Combinators is not related to lambda calculus in a trivial way, and thus as noted in the README, HVM does not 'support' the full Lambda Calculus. In a vast majority of cases the HVM approach really does simply evaluate lambda terms at impressive speed, but the failure cases are quite tricky to handle, in that they will fail silently, simply evaluating the term incorrectly. This issue however, is acknowledged in the README and can be tackled by the method described there, or by type-checking for these 'failure' cases https://github.com/HigherOrderCO/HVM/discussions/97.

The issue mentioned in #60 however, seems to be quite fundamental - HVM makes lambda evaluation fast by implementing lambda evaluation in terms of Symmetric Interaction Combinators and making sure each operation of SIC evaluates in constant time. This works in most cases, but as acknowledged in Asperti and Guerrini, in some cases the SIC representation of lambda terms simply become themselves very (exponentially) large. Victor appears to acknowledge that he is simply building a practical counterpart upon the theory as cited from Asperti and Guerrini, citing Asperti and Guerrini's 'safe rules', which has not yet been implemented, as his plan to prevent very large representations of lambda terms from occurring, but Asperti and Guerrini themselves acknowledge that their rules are probably incomplete. Crucially, these terms appear to show up in regular functional programming folds.

The promise of HVM is that we have finally found a way to naturally implement the parallelism inherent in pure lambda terms, without special annotation or extra accommodation for the runtime. As it stands, HVM still falls short of the lofty dream.

5 comments

qazxcvbnm

LightMachine 2 years ago

Victor here, thanks for the kind words. Regarding issue #60;

Short answer: this is true, but isn't a fundamental limitation and will be addressed in the future.

Long answer: this was quite a surprise to me back then, as it defies my understanding of "optimal", and was quite disappointing, but I learned a lot since. My current take is that, yes, some terms are quadratic on HVM while they are linear on the GHC, but that isn't due to some "catastrophic incompatibility", but just an inherent limitation of sharing-by-lazy-duplication. That said, as a practical runtime, HVM is not restricted to have only one way to share data. Rust, for example, has several ways: `&`, `&mut`, `Box`, `Rc`, `Arc`, `Cell`, `RefCell`, `Mutex`, `RwLock`, `* const T`, `*mut T`. We will add alternative pointer types to address this issue. That said, first we want to get the pure version completely polished, before moving into these optimizations. Meanwhile, it is worth noting that it is easy to fix these quadratic functions by adding strategic lambdas to avoid unnecessary cloning. This will make them linear again, so a functional language compiling to HVM properly wouldn't have any case where it underperforms GHC asymptotically (and would have several cases where it outperforms it!).

qazxcvbnm 2 years ago
Hi Victor, it's great to hear your ideas on how to tackle the issue. Do you have an example or a sketch of how strategic lambdas can be used to avoid unnecessary cloning? I'd love to hear more.
Indeed, if one were to figure out how to strategically add appropriate mechanisms to share data, perhaps the issue can be solved. It seems a highly nontrivial problem - but perhaps it would have taken someone brave as you to tackle it head on.
I can only hope that such strategic sharing could be solved and formalised - if we only had that together with HVM, effortless parallelism just feels like it could be within reach.
(I wonder if how the call-by-value linear logic translation compares with the call-by-name translation in terms of interaction nets, IIRC it should result equally in an optimal evaluator)
- LightMachine 2 years ago
  
  Of course! Let me write a complete overview on the subject. Consider the following program:
  (Foo a b) = (If a (F b) (G b))
  It is easy to note that Foo makes an unnecessary clone of the 'b' variable, since it is used in different branches. This could be avoided by adding a lambda, as follows:
  (Foo a b) = ((If a λb(F b) λb(G b)) b)
  That is what I mean when I said that, in most instances, cloning can be avoided by adding strategic lambdas. Now, in some cases, such as issue #60, this won't be possible. Issue #60 isolates a situation where lazy cloning is necessarily quadratic. Here it is:
  (Gen 0) = Nil (Gen n) = (Cons n (Gen (- n 1))) (Sum Nil) = 0 (Sum (Cons x xs)) = (+ x (Sum xs)) (Tails Nil) = (Cons Nil Nil) (Tails (Cons x xs)) = (Cons xs (Tails xs)) (Heads Nil) = Nil (Heads (Cons x xs)) = (Cons (Head x) (Heads xs)) (Head Nil) = 0 (Head (Cons x xs)) = x Main = (Sum (Heads (Tails (Gen 1000))))
  What this program does: first, it converts a list like [1,2,3,4,5] into a list of where each element stores its tail: [[2,3,4,5],[3,4,5],[4,5],[5]]. Then, it takes the head of each element, resulting in [2,3,4,5]. Notice it is a very inefficient, roundabout implementation of 'tail', which happens to be quadratic on HVM, and linear on GHC. In this specific case, this could be fixed by just writing Tail as usual:
  (Tail (Cons x xs)) = xs
  It is worth pointing out that this is the case in all examples where HVM is sub-optimal! That is, there is always an alternative algorithm that avoids unnecessary cloning. That said, of course, in more complex programs, it will be impossible to identify sub-optimal situations and fix automatically (unless supercompilers become practical). As such, it is important for HVM to be able to handle these programs without quadratic slowdowns. To understand how that could be accomplished, let's first note that the culprit of the quadratic slowdown is the way HVM handles a non-linear variable. On the line below:
  (Tails (Cons x xs)) = (Cons xs (Tails xs))
  HVM's default behavior is to assume that, by writing `xs` twice, you meant to do a lazy clone. In other words, the program above is compiled as:
  (Tails (Cons x xs)) = let [xs_a, xs_b] = xs.lazy_clone() (Cons xs_a (Tails xs_b))
  In order to work, lazy cloning adds a "fan node" to the program's graph, which uses space. That means that, if you perform a lazy clone and immediately discard it, you'll be left with a "useless clone node" that still occupies memory. On the program shown on issue #60, we're essentially constructing an entire Tail and immediately dropping it with a call to Head, and we do that recursively, so, after enough calls, we have, in memory, the equivalent of:
  let [_, xs0] = xs.lazy_clone() [_, xs1] = xs0.lazy_clone() [_, xs2] = xs1.lazy_clone() [_, xs3] = xs2.lazy_clone() ... [_, xs{N}] = xs{N-1}.lazy_clone() in xsN
  Notice how this expression takes a list, clones it several times, discards all the clones, and then immediately returns the original list. These accumulating useless let-expressions (i.e., cloning nodes) are the source of the quadratic slowdown. So, how do we fix it? The solution proposed on TOIFPL book is to add safe pointers, which would allow us to remove redundant clone nodes when safe. But there is a simpler approach: just do NOT make a lazy clone there, to begin with! While lazy cloning is great for things like higher order lambda compositions, in some cases, a good-old pass-by-reference would do a better job. In a future version of the HVM, we could have something like:
  (Tails (Cons x &xs)) = (Cons xs (Tails xs))
  Where `&` would mean "pass by immutable reference", skipping lazy cloning entirely, and making this program linear, as it should.
  In short, the culprit for the quadratic slowdown is merely the fact that HVM automatically inserts a ".lazy_clone()" whenever you use a variable more than once, but there is no inherent incompatibility between interaction nets and good-old pass-by-reference, so HVM will have different pointer types that allow a programmer to implement functions like Tails that behave like they would on GHC, without any associated slowdown.
  
  2 replies →