Comment by LightMachine
8 months ago
It is an interpreter that runs on GPUs, and a compiler to native C and CUDA. We don't target SPIR-V directly, but aim to. Sadly, while the C compiler results in the expected speedups (3x-4x, and much more soon), the CUDA runtime didn't achieve substantial speedups, compared to the non-compiled version. I believe this is due to warp-divergence: with non-compiled procedures, we can actually merge all function calls into a single "generic" interpreted function expander that can be reduced by warp threads without divergence. We'll be researching this more extensively looking forward.
Oh that's cool! Interested to see where your research leads. Could you drop me a link to where the interaction net → cuda compiler resides? I skimmed through the HVM2 repo and just read the .cu runtime file.
Edit: nvm, I read through the rest of the codebase. I see that HVM compiles the inet to a large static term and then links against the runtime.
https://github.com/HigherOrderCO/HVM/blob/5de3e7ed8f1fcee6f2...
Will have to play around with this and look at the generated assembly, see how much of the runtime a modern c/cu compiler can inline.
Btw, nice code, very compact and clean, well-organized easy to read. Rooting for you!