Comment by LightMachine

1 year ago

It is an interpreter that runs on GPUs, and a compiler to native C and CUDA. We don't target SPIR-V directly, but aim to. Sadly, while the C compiler results in the expected speedups (3x-4x, and much more soon), the CUDA runtime didn't achieve substantial speedups, compared to the non-compiled version. I believe this is due to warp-divergence: with non-compiled procedures, we can actually merge all function calls into a single "generic" interpreted function expander that can be reduced by warp threads without divergence. We'll be researching this more extensively looking forward.

1 comment

LightMachine

animaomnium 1 year ago

Oh that's cool! Interested to see where your research leads. Could you drop me a link to where the interaction net → cuda compiler resides? I skimmed through the HVM2 repo and just read the .cu runtime file.

Edit: nvm, I read through the rest of the codebase. I see that HVM compiles the inet to a large static term and then links against the runtime.

https://github.com/HigherOrderCO/HVM/blob/5de3e7ed8f1fcee6f2...

Will have to play around with this and look at the generated assembly, see how much of the runtime a modern c/cu compiler can inline.

Btw, nice code, very compact and clean, well-organized easy to read. Rooting for you!