Comment by nnevatie
21 days ago
I found this a weird article.
If you wish to see some speedups using AVX512, without limiting yourself to C or C++, you might want to try ISPC (https://ispc.github.io/index.html).
You'll get sane aliasing rules from the perspective of performance, multi-target binaries with dynamic dispatching and a lot more control over the code generated.
ispc is something that deserves to be much more widely known about- it does an excellent job of bringing the cuda programming model to cpus
Is there a way to compile it to something else than x86, like arm/aarch64?
> It currently supports multiple flavours of x86 (SSE2, SSE4, AVX, AVX2, and AVX512), ARM (NEON), and Intel® GPU architectures (Xe family).
Hi, I actually mentioned ISPC several times there. And although I strenuously avoided crowning one approach "better" over the other, it is worth pointing out that 1) Many of these benefits of ISPC can be had from explicit SIMD libraries like Google's Highway, and 2) ISPC (or any SIMT model) is a departure from how the underlying hardware works, and as the AI community is discovering with GPU, this abstraction can sometimes be lot more headache than its worth.
Ispc looks interesting. Does it work with amd? They hint on gpu’s , i guess mostly intel ones?
Yes, it works with AMD CPUs as well as various ARM ones, e.g. Apple silicon.
See for instance https://github.com/ispc/ispc/pull/2160
Yes, works well with AMD. You can compile multi-target so that you'll have e.g. SSE4.2, AVX2, AVX512 support built to your binaries and the best (widest) version is picked by the runtime automatically.