Comment by formerly_proven
13 hours ago
Lazy man's "kinda good enough for some cases SIMD in pure Rust" is to simply target x86-64-v3 (RUSTFLAGS=-Ctarget-cpu=x86-64-v3), which is supported by all AMD Zen and Intel CPUs since Haswell; and for floating point code, which cannot be auto-vectorized due to the accuracy implications, "simply" write it with explicit four or eight-way lanes, and LLVM will do the rest. Usually. Loops may need explicit handling of head or tail to auto-vectorize (chunks_exact helps with this, it hands you the tail).
No comments yet
Contribute on Hacker News ↗