Comment by vlovich123
1 year ago
Disclaimer: not an expert here so could be very very wrong. This is just my understanding so happy to be corrected.
Another would be that something like fused multiple add would have different (higher if I recall correctly) precision which violates IEE754 and thus vectorization since default options are standard compliant.
Another is that some math intrinsics are documented to populate errno which would prevent using autovec in paths that have an intrinsic.
There may be other nuances depending on float vs double.
Basically most of the things that make up ffast-math i believe would prevent autovectorization.
Fused multiply add applies equally to scalar and vectorized code (and C actually allows compilers to fuse them; there's -ffp-contract=off / the FP_CONTRACT pragma to turn that off); the compiler/autovectorizer can trivially just leave multiply & add as separate if so requested (slower than having them fused? perhaps. But no impact at all on scalar vs vector given that both have the same fma applicability).
For <math.h> errno, there's -fno-math-errno; indeed included in -ffast-math, but you don't need the entirety of that mess for this.
Loops with a float accumulator is I believe the only case where -ffast-math is actually required for autovectorizability (and even then iirc there are some sub-flags such that you can get the associativity-assuming optimizations while still allowing NaN/inf).