Comment by t55

10 months ago

Triton sits between CUDA and PyTorch and is built to work smoothly within the PyTorch ecosystem. In CUDA, on the other hand, you can directly manipulate warp-level primitives and fine-tune memory prefetching to reduce latency in eg. attention algorithms, a level of control that Triton and PyTorch don't offer AFAIK.

3 comments

t55

pjmlp 10 months ago

MLIR extensions for Python do though, as far as I could tell from LLVM developer meeting.

6gvONxR4sf7o 10 months ago
MLIR is one of those things everyone seems to use, but nobody seems to want to write solid introductory docs for :(
I've been curious for a few years now to get into MLIR, but I don't know compilers or LLVM, and all the docs I've found seem to assume knowledge of one or the other.
(yes this is a plea for someone to write an 'intro to compilers' using MLIR)
- pjmlp 10 months ago
  
  Not sure if you will be able to follow along, but here it is what I was talking about,
  "PyDSL: A MLIR DSL for Python developers"
  https://www.youtube.com/watch?v=iYLxgTRe8TU
  "PyDSL, a subset of Python for constructing affine & transform dialects"
  https://www.youtube.com/watch?v=nmtHeRkl850
  And MLIR channel,
  https://www.youtube.com/@MLIRCompiler