Comment by nitrogen99

4 months ago

If you are a Python dev, why not just use Triton?

8 comments

nitrogen99

Triton sits between CUDA and PyTorch and is built to work smoothly within the PyTorch ecosystem. In CUDA, on the other hand, you can directly manipulate warp-level primitives and fine-tune memory prefetching to reduce latency in eg. attention algorithms, a level of control that Triton and PyTorch don't offer AFAIK.

pjmlp 4 months ago
MLIR extensions for Python do though, as far as I could tell from LLVM developer meeting.
- 6gvONxR4sf7o 4 months ago
  
  MLIR is one of those things everyone seems to use, but nobody seems to want to write solid introductory docs for :(
  I've been curious for a few years now to get into MLIR, but I don't know compilers or LLVM, and all the docs I've found seem to assume knowledge of one or the other.
  (yes this is a plea for someone to write an 'intro to compilers' using MLIR)
  
  1 reply →

saagarjha 4 months ago

Triton is somewhat limited in what it supports, and it’s not really Python either.

pavelstoev 4 months ago

or use Hidet compiler (open source)

t55 4 months ago
never heard of Hidet before; for when/what would I use it over CUDA/Triton/Pytorch?
- pavelstoev 4 months ago
  
  It is written in Python itself and emits efficient CUDA code. This way, you can understand what is going on. The current focus is on inference, but hopefully, training workloads will be supported soon. https://github.com/hidet-org/hidet