Comment by nitrogen99

2 days ago

If you are a Python dev, why not just use Triton?

Triton sits between CUDA and PyTorch and is built to work smoothly within the PyTorch ecosystem. In CUDA, on the other hand, you can directly manipulate warp-level primitives and fine-tune memory prefetching to reduce latency in eg. attention algorithms, a level of control that Triton and PyTorch don't offer AFAIK.

  • MLIR extensions for Python do though, as far as I could tell from LLVM developer meeting.

    • MLIR is one of those things everyone seems to use, but nobody seems to want to write solid introductory docs for :(

      I've been curious for a few years now to get into MLIR, but I don't know compilers or LLVM, and all the docs I've found seem to assume knowledge of one or the other.

      (yes this is a plea for someone to write an 'intro to compilers' using MLIR)

      1 reply →