Comment by astrange
1 year ago
Also, if you write code with intrinsics the autovectorization can make it _worse_. eg a pattern is to write a SIMD main loop and then a scalar tail, but it can autovectorize that and mess it up.
1 year ago
Also, if you write code with intrinsics the autovectorization can make it _worse_. eg a pattern is to write a SIMD main loop and then a scalar tail, but it can autovectorize that and mess it up.
Given the wider availability of masking (AVX-512, RISC-V and SVE), I figure scalar tails are no longer the preferred pattern everywhere.