Higher-Order Virtual Machine (HVM)

2 years ago (github.com)

50 comments

anisoco

I've been following the work you've been doing for a while now, and the culmination of all your progress for the past few years toward HVM, has been very inspirational. Keep up the good work, and best of luck with HighOrderCo. If anyone, reading HVM's and Inpla's gentle introductions to Interaction Nets, is looking for an overview of Lafont's papers and Sato's implementation designs, you might enjoy:

https://wiki.xxiivv.com/site/interaction_nets.html

If someone is looking for a graphical playground to experiment with interaction nets, here's Sylvain Lippi's InTwo(gtk):

https://github.com/max22-/intwo

Anyone looking to better understand linear logic in the context of IN ought to have a look at Victor's Absal:

https://github.com/VictorTaelin/abstract-algorithm

Someone below linked to Baker's excellent linear lisp paper(psi-lisp), I'd like to augment this recommendation with Linear Completeness theory of combinators:

http://tunes.org/~iepos/joy.html#linear

LightMachine 2 years ago

Author here! Thanks for posting :) Just want to make a brief note here that we raised a 4.5m seed round to found a tech startup, the Higher Order Company, to work on the HVM. Our goal is to improve the runtime, the Kind language, and eventually build our own hardware. There are already several cool ideas on the pipeline, so expect exciting news and massive improvements soon!

kldx 2 years ago

Congrats on the funding. Happy to hear!
antibasilisk 2 years ago
Are you hiring?
- LightMachine 2 years ago
  
  Yes, starting next week or so! We'll be looking for engineers that have skills relevant to our projects, including compilers, low-level assembly optimization, functional programming, type theory, parallel computing, and so on. Right now, we're quite busy with incorporation and bureaucracy, but you should see job postings in our to-be-released landing page (higherorderco.com) soon. Meanwhile, I answer DMs on Twitter (@VictorTaelin) and Discord (VictorTaelin#2253).
  
  6 replies →
haweemwho 2 years ago

What do the GHC folks (or anybody from the Haskell community) say about your work?

_a_a_a_ 2 years ago

This seems very interesting, and I like your FAQ where you get honest, but your intro almost put me off:

"be up to exponentially faster than alternatives" – hmm

"Interaction Net, which supersedes the Turing Machine" – Interaction Net link points to a paper on Interaction Combinators. And talking about superseding Turing machines is embarrassing – do you have Oracle powers? (https://en.wikipedia.org/wiki/Oracle_machine)

"set to scale towards uncharted levels of performance" – that's just a bit cringy

I've come across claims of no GC needed and I don't buy it unless in restricted domains, and it would be nice to have a link towards what you mean by 'beta optimality'. I would like to see the sort benchmark being done something realistic (recursive bubble sort, the two words together don't look good). Also automatic parallelism..., Immutability apparently for free, everything higher-order, it seems you hit a lot of holy grails in one go. Rather too many for me to feel comfortable.

The FAQ by contrast is a whole lot more reasonable and upfront, perhaps be careful about overselling yourself?

LightMachine 2 years ago

I agree with most of your points, we should definitely rewrite and tone down the README. The FAQ was written a few months after, due to that kind of negative reception. To address your points, note that by "superseding" the Turing Machine, I do not mean they can compute uncomputable functions! The meaning is that, while both are equivalent in terms of *computability*, Interaction Nets have improved *computational* characteristics, such as strong confluence and concurrency. This is a very technical aspect that is addressed on the paper I linked. Regarding not buying the claim of no GC needed, this is interesting, since this is a well stablished result: HVM is entirely linear, so it can avoid global GCs for the same reason as Rust. The difference is that HVM is more inherently high level, since its support for lambda is way more first-class than Rust's closures, which are comparably slow and almost guarantee borrow-checker hell.
LightMachine 2 years ago

(Also, exponentially faster is used on the technical sense here! Optimal evaluators can reduce some programs with linear time complexity, while conventional runtimes are exponential.)

kaba0 2 years ago

Interesting, though I can’t help but feel that the tone of the first few paragraphs could be a bit more.. humble? Even if the underlying model is much better, you can’t really compare a complete project to one that is alpha. Real world complexities catch up in the later phase of bigger projects.

Nonetheless, I eagerly await its future.

LightMachine 2 years ago
I'm working on improving my communication :(
- kaba0 2 years ago
  
  That’s the only thing that matters!

thechao 2 years ago

I've never heard of interaction nets (which just shows how overspecialized CS can be: I guess I have a PhD in PLT?); so... I wikipedia'd it up. I normally don't trust Wikipedia any further than I can throw it, but ... "interaction nets":

https://en.wikipedia.org/wiki/Interaction_nets

geocar 2 years ago
Have you heard of Linear Lisp? https://www.cs.utexas.edu/users/hunt/research/hash-cons/hash...
- thechao 2 years ago
  
  Yeah — I used to program in a GC-free LISP in grad school. That's why I'm so flabbergasted.
generalizations 2 years ago

And an entire list of other models of computation: https://en.wikipedia.org/wiki/Model_of_computation

scotty79 2 years ago

For anyone interested how interaction nets work here's how one would program them directly:

https://github.com/inpla/inpla/blob/main/Gentle_introduction...

LightMachine 2 years ago

Inpla is a very interesting project that is more elegant than HVM in several regards (as it doesn't need to worry about functional programming!) and should be more known. I've added a link to it from the HVM's repo a few months ago.

Laaas 2 years ago

It is great to see the improved documentation! Thanks, Victor.

Hopefully once the full LC is supported, we can try to make an STG-to-HVM-language translator.

xiphias2 2 years ago

One thing that I don't see addressed (and the main problem I had with Haskell) is the predictibility of runtime: if I change a function somewhere to be 2x slower by doing something differently can it have a much bigger than 2x difference in functions that use it?

With Rust, the type system guarantees safety and predictable runtime as well. I don't see how any model that doesn't have linear / affine typesystem implemented and exposed to the programming language give me the same guarantee.

LightMachine 2 years ago
Predictability is one of the most exciting aspects of HVM, to me. That's because everything is linear, so time and space costs are completely measurable, in a way that resembles C, but even more profoundly. For example, in the following GitHub issue:
https://github.com/HigherOrderCO/HVM/issues/167
I discuss how HVM is able to perform deforesting, one of the core techniques that made GHC so fast, "for free" at runtime, without being explicitly hardcoded. That is great, but the point I'd like to make is how I show that: by measuring the 'rewrite count' of a program with HVM's '-c' flag. It shows you how many rewrites a program evaluation took. Since each rewrite is a constant time operation, this gives us a very precise metric of the complexity of a program.
On the issue above, I implemented two versions of the same function, and measured their rewrite counts. Here are the tables:
Fn V1 N | rewrites - | -------- 0 | 402 1 | 802 2 | 1202 3 | 1602 4 | 2002 5 | 2402 6 | 2802 7 | 3202 8 | 3602 9 | 4002 Fn V2 N | rewrites - | -------- 0 | 1403 1 | 1408 2 | 1413 3 | 1418 4 | 1423 5 | 1428 6 | 1433 7 | 1438 8 | 1443 9 | 1448
From these numbers, we can tell V2 scales better than V1, without doing any benchmark. Not only that, but we're able to write explicit formulas for the runtime cost of each version:
TimeComplexity(FnV1(N)) = 402 + 400*N rewrites TimeComplexity(FnV2(N)) = 1403 + 5*N rewrites
Which means both are O(N), but V2 has a 80x smaller linear coefficient. This precision extends to memory and space too. This is why we're so excited to build Kindelia to apply HVM to the context of blockchain VMs, as this granular measurability isn't available on GHC, which is why Cardano can't use the Haskell compiler, and must resort to an interpreter, greatly affecting its throughput. Of course, this is just one of many the applications of the HVM.
- xiphias2 2 years ago
  
  I wish I could say that I understand you, but at least what you write is exciting enough for me to read the paper that you linked, as it looks simple enough even for me:
  https://reader.elsevier.com/reader/sd/pii/S0890540197926432
  It's funny, that as a BTC maxi I'm not fond of new tokens (like Cardano) being created, but the work they are funding are exceptional (my favourite is Vitalik funding a lot of longevity science, which would be the job of the governments, but they don't do anything about it).
  Some more questions:
  - Are you planning GPU backends? (emitting parallel C code and using CUDA / OpenCL compilers, like how Tinygrad does
  - Can the code be used as a more efficient Jax / PyTorch compiler in the future? AI code is functional, though the building blocks (basic instructions, register, cache sizes, memory bandwidth) are the main optimization criteria
  
  1 reply →

stefanha 2 years ago

Can someone explain or share references on how HVM (or GHC or similar computational models) beat optimized compiled imperative languages like C, C++, or Rust?

I took a quick look at the HVM runtime and can vaguely make out the shape of the memory and code that reduces terms. While it's cool to have a computational model that evaluates everything only once and avoids duplicating computation, if it does not compile programs to machine code to exploit hardware, then the single-core performance will be worse than C/C++/Rust.

Something that generates a few machines instructions in C/C++/Rust requires reducing a graph of terms - many CPU instructions for each reduction and many memory accesses that may have poor locality.

Is the idea that programs contain enough parallelism that is out of reach of human programmers so that HVM will make up for the single-core performance drag through parallelism?

Maybe I'm missing what the HVM compiler does?

Maybe hardware acceleration with FPGAs or ASICs could be applied?

In other words, can this beautiful and elegant computational model be implemented with efficiency that matches C/C++/Rust?

LightMachine 2 years ago
Where have you read that HVM beats C or Rust? That is not something I've ever written and we're far from that, specially considering the budget that has been thrown into Rust, although we could (and hope to) get there one day. This is possible in theory, because interaction nets are as lightweight as Rust, but have the added benefit of built-in parallelism. Please see the whole FAQ, I think it addresses all your questions. Let me know if you have any further questions.
- stefanha 2 years ago
  
  Thanks for the reply. The FAQ doesn't really answer my specific questions, but I understand that HVM is still young and there aren't answers to everything yet.
  The FAQ mentions that every reduction rule can be compiled to machine code in theory. That's the core of what I'm asking - can reduction be implemented efficiently? I suspect it's not enough to have fast reduction rules, because that still requires a runtime that spends CPU cycles and memory accesses on reduction housekeeping. To be fast you need to minimize explicit reduction at runtime because that's not what CPUs are good at.
  I guess an aggressive HVM compiler would find sub-programs that can be treated as leaf nodes and replaced entirely with compiled machine code. The machine code doesn't use interaction nets but it computes an equivalent result to the interaction net. This doesn't mean that reduction and interaction nets aren't used, they still serve a purpose for lazy evaluation and parallelism at higher levels of the program. The compiler would have to figure out where to draw the line between the high-level reduction runtime and the leaf nodes.
  Anyway, this is just what comes to mind. I don't really know how HVM works, but thanks for sharing it.
haweemwho 2 years ago

It's a scam. Don't bother.

dustingetz 2 years ago

how does Interaction Nets model compare with other concurrency models such as dataflow programming with Arrow? (For example, I maintain a distributed/reactive Clojure dialect for unified fullstack web development which seems to show a deep connection to HVM despite being based on dataflow not interaction nets - https://github.com/hyperfiddle/electric)

haweemwho 2 years ago

Alright, let me be the skeptic here and ask. Honest question, not intending to just dismiss anybody outright. But I have questions.

I've looked up Victor Taelin and his HighOrderCo company that supposedly just raised $4.5M. So here we have this Brazilian guy who graduated university in 2017, and I don't mean that in a bad way. It's just a fact, a data point. Fresh ideas are always welcome. From his Twitter feed seems to be deeply into the crypto space. He is over- or at least heavily upselling this HVM project as the future of computing. Lots of fuzz about interaction nets and linear logic and that this is the next thing after Turing Machines and Lambda calculus. Ok, strong claims. It makes me a tad suspicious though. Outperforming GHC? Running a new chain that's not a chain so it's not a crypto currency, but somehow it's a chain, but it's formally verified, hm, ok. And how can he strongly claim this to be the superior thing if he just announced how to model complex numbers in his HVM project, and that GPT-4 helped him get this insight? I'd have expected Clifford Algebras to be basic knowledge for everybody working in this space.

I'm only superficially familiar with ETH, with Haskell's GHC, with Rust, with HOTT. But, I'd like to hear from people who know more than I do, and who are not already obvious fanboys of HighOrderCo with submarine PR interests -- how much weight should I put into this? Is it worth reading about, getting my head busy? I've spent an hour now trying to check up on things, but apart from a lot of fluff and quite a lot of technical jargon that went above my head, I'm left confused.

Is there something or is this just supposed to confuse me to get into the next scam?

Edit: Just to add to this. https://news.ycombinator.com/item?id=35344514 discusses an article written about the underlying tech. It's a terrible article on the level of a newcomer to programming who doesn't know much about functional programming and even admits as much. Victor Taelin comments with "Very well written and beautiful article." This must be a scam. I'll stop wasting my time on this until somebody manages to convince me otherwise.

qazxcvbnm 2 years ago
Hi, I've spent quite an amount of time digging into papers on the computational models which Victor has linked to.
I do notice Victor's tendency to be hyperbolic and defensive, which is quite unfortunate, and I don't know or care much about the blockchain that Victor is also running. (By the way, I think that Victor's real name is probably Victor Maia rather than Taelin, as you can corroborate from a number of sources; it seems Taelin is just his handle on Github. It seems that Victor used to work at Ethereum Foundation with the title of Researcher https://bobsummerwill.com/ethereum-foundation-people/?)
On the other hand, the computational model that Victor's HVM is built on was invented by Yves Lafont, and strongly inspired by Jean Yves Girard (who is known for Linear Logic, Girard's paradox, and System-F (of which, according to Wikipedia "forming a theoretical basis for languages such as Haskell")). I can assure you that Lafont and Girard are logicians of the highest caliber, and indeed the promises that HVM intends to fulfill are the same notions envisioned by Girard and computer scientists working on Geometry of Interaction and Interaction Nets. If you spend a few minutes to consider the case on the README where HVM claims to outperform GHC (e.g. regarding Lambda Multiplication), it should not be hard to understand why it is possible in certain cases for HVM to outperform essentially anything else, and the potential that improvements in computational models can bring.
In any case, Victor has indeed put a very promising computational model to serious optimisation, and that alone is worth earnest commendation. Reading in the details on HVM will make it clear that although it works today, it is still very much a prototype in-progress. Personally, I think the computational challenges that still need to be tackled to bring Interaction Nets to life remain intractable (as do Asperti and Guerrini, in IIRC chapter 10 of The Optimal Implementation of Functional Programming Languages; Victor mentions basing his implementation on Asperti and Guerrini's work, which makes it surprising that Victor is surprised that optimal reduction has edge cases), but any serious effort in this direction is badly needed and should probably be encouraged. If VCs are putting their money behind bringing Girard's ideas to popular imagination, then perhaps for once they are doing good.
- haweemwho 2 years ago
  
  Thanks for the balanced feedback. It's not too far from the impression I got. I do take Lafont and Girard seriously and have greatest respect for them.
  
  1 reply →

qazxcvbnm 2 years ago

The HVM project is incredibly interesting and I applaud the great work Victor Taelin has put in practically implementing Interaction Nets and popularising Linear Logic.

I've noticed HVM since February last year and was quite convinced that they were really up to something; however now that I've gone through a pile of papers on Linear Logic and the book which HVM is based on (The Optimal Implementation of Functional Programming Languages, Asperti and Guerrini, https://www.amazon.co.jp/-/en/Andrea-Asperti/dp/0521621127), I believe that among other things, the issue presented here https://github.com/HigherOrderCO/HVM/issues/60 is really a fundamental blocker to serving as a general purpose runtime, which it appears HVM attempts at.

Fundamentally HVM is an impressively optimised implementation of Symmetric Interaction Combinators, a variety of Interaction Net. Symmetric Interaction Combinators is not related to lambda calculus in a trivial way, and thus as noted in the README, HVM does not 'support' the full Lambda Calculus. In a vast majority of cases the HVM approach really does simply evaluate lambda terms at impressive speed, but the failure cases are quite tricky to handle, in that they will fail silently, simply evaluating the term incorrectly. This issue however, is acknowledged in the README and can be tackled by the method described there, or by type-checking for these 'failure' cases https://github.com/HigherOrderCO/HVM/discussions/97.

The issue mentioned in #60 however, seems to be quite fundamental - HVM makes lambda evaluation fast by implementing lambda evaluation in terms of Symmetric Interaction Combinators and making sure each operation of SIC evaluates in constant time. This works in most cases, but as acknowledged in Asperti and Guerrini, in some cases the SIC representation of lambda terms simply become themselves very (exponentially) large. Victor appears to acknowledge that he is simply building a practical counterpart upon the theory as cited from Asperti and Guerrini, citing Asperti and Guerrini's 'safe rules', which has not yet been implemented, as his plan to prevent very large representations of lambda terms from occurring, but Asperti and Guerrini themselves acknowledge that their rules are probably incomplete. Crucially, these terms appear to show up in regular functional programming folds.

The promise of HVM is that we have finally found a way to naturally implement the parallelism inherent in pure lambda terms, without special annotation or extra accommodation for the runtime. As it stands, HVM still falls short of the lofty dream.

LightMachine 2 years ago
Victor here, thanks for the kind words. Regarding issue #60;
Short answer: this is true, but isn't a fundamental limitation and will be addressed in the future.
Long answer: this was quite a surprise to me back then, as it defies my understanding of "optimal", and was quite disappointing, but I learned a lot since. My current take is that, yes, some terms are quadratic on HVM while they are linear on the GHC, but that isn't due to some "catastrophic incompatibility", but just an inherent limitation of sharing-by-lazy-duplication. That said, as a practical runtime, HVM is not restricted to have only one way to share data. Rust, for example, has several ways: `&`, `&mut`, `Box`, `Rc`, `Arc`, `Cell`, `RefCell`, `Mutex`, `RwLock`, `* const T`, `*mut T`. We will add alternative pointer types to address this issue. That said, first we want to get the pure version completely polished, before moving into these optimizations. Meanwhile, it is worth noting that it is easy to fix these quadratic functions by adding strategic lambdas to avoid unnecessary cloning. This will make them linear again, so a functional language compiling to HVM properly wouldn't have any case where it underperforms GHC asymptotically (and would have several cases where it outperforms it!).
- qazxcvbnm 2 years ago
  
  Hi Victor, it's great to hear your ideas on how to tackle the issue. Do you have an example or a sketch of how strategic lambdas can be used to avoid unnecessary cloning? I'd love to hear more.
  Indeed, if one were to figure out how to strategically add appropriate mechanisms to share data, perhaps the issue can be solved. It seems a highly nontrivial problem - but perhaps it would have taken someone brave as you to tackle it head on.
  I can only hope that such strategic sharing could be solved and formalised - if we only had that together with HVM, effortless parallelism just feels like it could be within reach.
  (I wonder if how the call-by-value linear logic translation compares with the call-by-name translation in terms of interaction nets, IIRC it should result equally in an optimal evaluator)
  
  3 replies →

Y_Y 2 years ago

This reminds me strongly of Absal[0]. Could somebody familiar with both give a comparison?

This looks like a really cool idea, and I'd love to see a GHC backend for it.

It might also be nice to have a brief explanation if an interaction between without just linking to a big theory paper.

[0] https://github.com/VictorTaelin/abstract-algorithm

eta: I only now notice that Victor Taelin is the author of both packages.

LightMachine 2 years ago

Absal is just an early implementation and experiment. It is written in JS and not meant to be a production-ready runtime and general purpose compilation target. HVM has that goal. That said, Absal is a great repo to study and learn the theory.

zinclozenge 2 years ago

Are there plans to expand documentation on how to use HVM as a general compilation target like LLVM?

LightMachine 2 years ago

Yes! But first we want to implement some essential features, like full lambda support and some high priority optimizations. Once HVM is ready to support production software, we'll provide docs on how to best use it as a general compilation target. This will be announced in our Discord server: http://discord.higherorderco.com/

lliamander 2 years ago

How does the compiler know which parts of the code can be run in parallel?

My understanding is that while parallelism (actually executing instructions simultaneously) is a property of the runtime, it depends upon concurrency (the separation of computations as independent) which is always something that has to be specified in the program source code.

Erlang has a similar property of "scaling with core-count for free" provided you write your program with a prodigious use of processes (which Erlang makes very easy and idiomatic). How is it that HVM is able to identify independent computations?

I admit to being only passingly familiar with Haskell-style FP, so perhaps the answer to my question is obvious.

sixbrx 2 years ago

What's the role of the "Kind" language in all of this? I see only lisp-like expressions when showing the capabilities of the new VM. IIRC Kind language was done by the same organization and I really liked Kind as a fresh take on pure functional programming with dependent types. I can't find it now, but it had a different notion of dependency where a dependent function type could depend on an argument (as usual) and (IIRC) the function value itself as well (not usual).