Comment by markisus
4 hours ago
The article mentions Triton for this purpose. I don’t think you will get maxed out performance on the hardware though because abstraction layers won’t let you access the fastest possible path.
4 hours ago
The article mentions Triton for this purpose. I don’t think you will get maxed out performance on the hardware though because abstraction layers won’t let you access the fastest possible path.
> I don’t think you will get maxed out performance on the hardware though because abstraction layers won’t let you access the fastest possible path.
You could argue about CPU architectures the same, no? Yet compilers solve this pretty well most of the time.