Triton sits between CUDA and PyTorch and is built to work smoothly within the PyTorch ecosystem. In CUDA, on the other hand, you can directly manipulate warp-level primitives and fine-tune memory prefetching to reduce latency in eg. attention algorithms, a level of control that Triton and PyTorch don't offer AFAIK.
MLIR is one of those things everyone seems to use, but nobody seems to want to write solid introductory docs for :(
I've been curious for a few years now to get into MLIR, but I don't know compilers or LLVM, and all the docs I've found seem to assume knowledge of one or the other.
(yes this is a plea for someone to write an 'intro to compilers' using MLIR)
It is written in Python itself and emits efficient CUDA code. This way, you can understand what is going on. The current focus is on inference, but hopefully, training workloads will be supported soon. https://github.com/hidet-org/hidet
Triton sits between CUDA and PyTorch and is built to work smoothly within the PyTorch ecosystem. In CUDA, on the other hand, you can directly manipulate warp-level primitives and fine-tune memory prefetching to reduce latency in eg. attention algorithms, a level of control that Triton and PyTorch don't offer AFAIK.
MLIR extensions for Python do though, as far as I could tell from LLVM developer meeting.
MLIR is one of those things everyone seems to use, but nobody seems to want to write solid introductory docs for :(
I've been curious for a few years now to get into MLIR, but I don't know compilers or LLVM, and all the docs I've found seem to assume knowledge of one or the other.
(yes this is a plea for someone to write an 'intro to compilers' using MLIR)
1 reply →
Triton is somewhat limited in what it supports, and it’s not really Python either.
or use Hidet compiler (open source)
never heard of Hidet before; for when/what would I use it over CUDA/Triton/Pytorch?
It is written in Python itself and emits efficient CUDA code. This way, you can understand what is going on. The current focus is on inference, but hopefully, training workloads will be supported soon. https://github.com/hidet-org/hidet