Comment by nnevatie

7 hours ago

I found this a weird article.

If you wish to see some speedups using AVX512, without limiting yourself to C or C++, you might want to try ISPC (https://ispc.github.io/index.html).

You'll get sane aliasing rules from the perspective of performance, multi-target binaries with dynamic dispatching and a lot more control over the code generated.

ispc is something that deserves to be much more widely known about- it does an excellent job of bringing the cuda programming model to cpus

  • Is there a way to compile it to something else than x86, like arm/aarch64?

    • > It currently supports multiple flavours of x86 (SSE2, SSE4, AVX, AVX2, and AVX512), ARM (NEON), and Intel® GPU architectures (Xe family).