Comment by t55

5 months ago

Triton sits between CUDA and PyTorch and is built to work smoothly within the PyTorch ecosystem. In CUDA, on the other hand, you can directly manipulate warp-level primitives and fine-tune memory prefetching to reduce latency in eg. attention algorithms, a level of control that Triton and PyTorch don't offer AFAIK.

3 comments

t55

pjmlp 5 months ago

MLIR extensions for Python do though, as far as I could tell from LLVM developer meeting.

6gvONxR4sf7o 5 months ago
MLIR is one of those things everyone seems to use, but nobody seems to want to write solid introductory docs for :(
I've been curious for a few years now to get into MLIR, but I don't know compilers or LLVM, and all the docs I've found seem to assume knowledge of one or the other.
(yes this is a plea for someone to write an 'intro to compilers' using MLIR)
- pjmlp 5 months ago
  
  Not sure if you will be able to follow along, but here it is what I was talking about,
  "PyDSL: A MLIR DSL for Python developers"
  https://www.youtube.com/watch?v=iYLxgTRe8TU
  "PyDSL, a subset of Python for constructing affine & transform dialects"
  https://www.youtube.com/watch?v=nmtHeRkl850
  And MLIR channel,
  https://www.youtube.com/@MLIRCompiler