Comment by nnevatie
7 hours ago
I found this a weird article.
If you wish to see some speedups using AVX512, without limiting yourself to C or C++, you might want to try ISPC (https://ispc.github.io/index.html).
You'll get sane aliasing rules from the perspective of performance, multi-target binaries with dynamic dispatching and a lot more control over the code generated.
ispc is something that deserves to be much more widely known about- it does an excellent job of bringing the cuda programming model to cpus
Is there a way to compile it to something else than x86, like arm/aarch64?
> It currently supports multiple flavours of x86 (SSE2, SSE4, AVX, AVX2, and AVX512), ARM (NEON), and Intel® GPU architectures (Xe family).
Ispc looks interesting. Does it work with amd? They hint on gpu’s , i guess mostly intel ones?
Yes, works well with AMD. You can compile multi-target so that you'll have e.g. SSE4.2, AVX2, AVX512 support built to your binaries and the best (widest) version is picked by the runtime automatically.
Yes, it works with AMD CPUs as well as various ARM ones, e.g. Apple silicon.
See for instance https://github.com/ispc/ispc/pull/2160