Comment by t55
8 months ago
Triton sits between CUDA and PyTorch and is built to work smoothly within the PyTorch ecosystem. In CUDA, on the other hand, you can directly manipulate warp-level primitives and fine-tune memory prefetching to reduce latency in eg. attention algorithms, a level of control that Triton and PyTorch don't offer AFAIK.
MLIR extensions for Python do though, as far as I could tell from LLVM developer meeting.
MLIR is one of those things everyone seems to use, but nobody seems to want to write solid introductory docs for :(
I've been curious for a few years now to get into MLIR, but I don't know compilers or LLVM, and all the docs I've found seem to assume knowledge of one or the other.
(yes this is a plea for someone to write an 'intro to compilers' using MLIR)
Not sure if you will be able to follow along, but here it is what I was talking about,
"PyDSL: A MLIR DSL for Python developers"
https://www.youtube.com/watch?v=iYLxgTRe8TU
"PyDSL, a subset of Python for constructing affine & transform dialects"
https://www.youtube.com/watch?v=nmtHeRkl850
And MLIR channel,
https://www.youtube.com/@MLIRCompiler