Comment by variadix
1 year ago
C compilers are still pretty bad at auto vectorization. For problems where SIMD is applicable, you can reasonably expect a 2x-16x speed up over the naive scalar implementation.
1 year ago
C compilers are still pretty bad at auto vectorization. For problems where SIMD is applicable, you can reasonably expect a 2x-16x speed up over the naive scalar implementation.
Also, if you write code with intrinsics the autovectorization can make it _worse_. eg a pattern is to write a SIMD main loop and then a scalar tail, but it can autovectorize that and mess it up.
Given the wider availability of masking (AVX-512, RISC-V and SVE), I figure scalar tails are no longer the preferred pattern everywhere.