Comment by fc417fc802

4 hours ago

Sort of not really. Compilers are fantastic for the typical stuff and that includes the compilers in the CUDA/ROCm/Vulkan/etc stacks. But on the CPU for the rare critical bits where you care about every last cycle or other inane details for whatever reason you're often all but forced to fall back on intrinsics and microarch specific code paths.

Yeah, that's why I said most of the time. Sometimes even for CPUs things need assembly. But no one stops you using GPU assembly either when needed I suppose? It should not be the default approach probably.