Comment by nnevatie

21 days ago

I found this a weird article.

If you wish to see some speedups using AVX512, without limiting yourself to C or C++, you might want to try ISPC (https://ispc.github.io/index.html).

You'll get sane aliasing rules from the perspective of performance, multi-target binaries with dynamic dispatching and a lot more control over the code generated.

7 comments

nnevatie

theowaway 21 days ago

ispc is something that deserves to be much more widely known about- it does an excellent job of bringing the cuda programming model to cpus

grumbelbart2 21 days ago
Is there a way to compile it to something else than x86, like arm/aarch64?
- nnevatie 21 days ago
  
  > It currently supports multiple flavours of x86 (SSE2, SSE4, AVX, AVX2, and AVX512), ARM (NEON), and Intel® GPU architectures (Xe family).

shihab 21 days ago

Hi, I actually mentioned ISPC several times there. And although I strenuously avoided crowning one approach "better" over the other, it is worth pointing out that 1) Many of these benefits of ISPC can be had from explicit SIMD libraries like Google's Highway, and 2) ISPC (or any SIMT model) is a departure from how the underlying hardware works, and as the AI community is discovering with GPU, this abstraction can sometimes be lot more headache than its worth.

majke 21 days ago

Ispc looks interesting. Does it work with amd? They hint on gpu’s , i guess mostly intel ones?

dataking 21 days ago

Yes, it works with AMD CPUs as well as various ARM ones, e.g. Apple silicon.
See for instance https://github.com/ispc/ispc/pull/2160
nnevatie 21 days ago

Yes, works well with AMD. You can compile multi-target so that you'll have e.g. SSE4.2, AVX2, AVX512 support built to your binaries and the best (widest) version is picked by the runtime automatically.