Comment by LightMachine
2 years ago
Predictability is one of the most exciting aspects of HVM, to me. That's because everything is linear, so time and space costs are completely measurable, in a way that resembles C, but even more profoundly. For example, in the following GitHub issue:
https://github.com/HigherOrderCO/HVM/issues/167
I discuss how HVM is able to perform deforesting, one of the core techniques that made GHC so fast, "for free" at runtime, without being explicitly hardcoded. That is great, but the point I'd like to make is how I show that: by measuring the 'rewrite count' of a program with HVM's '-c' flag. It shows you how many rewrites a program evaluation took. Since each rewrite is a constant time operation, this gives us a very precise metric of the complexity of a program.
On the issue above, I implemented two versions of the same function, and measured their rewrite counts. Here are the tables:
Fn V1
N | rewrites
- | --------
0 | 402
1 | 802
2 | 1202
3 | 1602
4 | 2002
5 | 2402
6 | 2802
7 | 3202
8 | 3602
9 | 4002
Fn V2
N | rewrites
- | --------
0 | 1403
1 | 1408
2 | 1413
3 | 1418
4 | 1423
5 | 1428
6 | 1433
7 | 1438
8 | 1443
9 | 1448
From these numbers, we can tell V2 scales better than V1, without doing any benchmark. Not only that, but we're able to write explicit formulas for the runtime cost of each version:
TimeComplexity(FnV1(N)) = 402 + 400*N rewrites
TimeComplexity(FnV2(N)) = 1403 + 5*N rewrites
Which means both are O(N), but V2 has a 80x smaller linear coefficient. This precision extends to memory and space too. This is why we're so excited to build Kindelia to apply HVM to the context of blockchain VMs, as this granular measurability isn't available on GHC, which is why Cardano can't use the Haskell compiler, and must resort to an interpreter, greatly affecting its throughput. Of course, this is just one of many the applications of the HVM.
I wish I could say that I understand you, but at least what you write is exciting enough for me to read the paper that you linked, as it looks simple enough even for me:
https://reader.elsevier.com/reader/sd/pii/S0890540197926432
It's funny, that as a BTC maxi I'm not fond of new tokens (like Cardano) being created, but the work they are funding are exceptional (my favourite is Vitalik funding a lot of longevity science, which would be the job of the governments, but they don't do anything about it).
Some more questions:
- Are you planning GPU backends? (emitting parallel C code and using CUDA / OpenCL compilers, like how Tinygrad does
- Can the code be used as a more efficient Jax / PyTorch compiler in the future? AI code is functional, though the building blocks (basic instructions, register, cache sizes, memory bandwidth) are the main optimization criteria
TBH, I'm somewhat of a Bitcoin maximalist too, which is kinda ironic since I hold no BTC; I guess the lack of expressivity of the network counters the big appreciation I have for its stable monetary policy and true decentralization. That's why, on HOC, I only accepted external funding under the conditions that Kindelia may not have a pre-mine - and am very glad to have found VCs that share this vision. (Kindelia is one of our internal projects, and is essentially Bitcoin with HVM-based contracts.)
To address your questions:
Yes, we have plans for GPU backends. In fact, I have even written a working prototype some time ago! It reduces λ-terms on the GPU using the same rules as HVM, with all the locks and atomics in place, and it seems to achieve a near ideal speedup, even with thousands of nVidia cores:
https://gist.github.com/VictorTaelin/e924d92119eab8b1f57719a...
That said, it is still just a single-file prototype. Sadly, we couldn't include a GPU backend on this funding round, but it is definitely something we'll be investing in a future, specially if we manage to grow as a company. Imagine writing pure Haskell functions and they run in thousands of GPU cores with no efforts? Pure functional shaders, physics engines...
Regarding PyTorch, I don't think HVM would be more efficient than it for ML, because PyTorch is already maximally optimized to use GPUs to their limit. HVM should be seen as a way to run high level programs in a massively parallel fashion without needing to write low level CUDA code yourself, but it won't outperform manually optimized CUDA on GPUs. That said, I do believe interaction net based processors would greatly outperform GPUs by breaking the Von Neumann bottleneck and unifying memory and computation in billions of nano-interaction-cores. I do believe such architecture could one day empower AI and make our LLMs and RNNs much faster.