Comment by astrange

1 year ago

Also, if you write code with intrinsics the autovectorization can make it _worse_. eg a pattern is to write a SIMD main loop and then a scalar tail, but it can autovectorize that and mess it up.

Given the wider availability of masking (AVX-512, RISC-V and SVE), I figure scalar tails are no longer the preferred pattern everywhere.