← Back to context

Comment by shmerl

9 hours ago

> I don’t think you will get maxed out performance on the hardware though because abstraction layers won’t let you access the fastest possible path.

You could argue about CPU architectures the same, no? Yet compilers solve this pretty well most of the time.

Sort of not really. Compilers are fantastic for the typical stuff and that includes the compilers in the CUDA/ROCm/Vulkan/etc stacks. But on the CPU for the rare critical bits where you care about every last cycle or other inane details for whatever reason you're often all but forced to fall back on intrinsics and microarch specific code paths.