Comment by ashvardanian

10 hours ago

Agreed! I was looking through the summation example < https://github.com/tracel-ai/cubecl/blob/main/examples/sum_t...> and it seems like the primary focus is on the more traditional pre-2018 GPU programming without explicit warp-level operations, asynchrony, atomics, barriers, or countless tensor-core operations.

The project feels very nice and it would be great to have more notes in the README on the excluded functionality to better scope its applicability in more advanced GPGPU scenarios.

We support warp operations, barriers for Cuda, atomics for most backends, tensor cores instructions as well. It's just not well documented on the readme!

  • Amazing! Would love to try them! If possible, would also ask for a table translating between CubeCL and CUDA terminology. It seems like CUDA Warps are called Planes in CubeCL, and it’s probably not the only difference.

CubeCL is the computation backend for Burn (https://burn.dev/) - ML framework done by the same team which does all the tensor magic like autodiff, op fusion and dynamic graphs.