Comment by xxpor

5 years ago

Handwritten ASM for perf is almost never worth it in modern times. C compiled with GCC/Clang will almost always be just as fast or faster. You might use some inline ASM to use a specific instruction if the compiler doesn't support generating it yet (like AVX512 or AES), but even for that there's probably an intrinsic available. You can still inspect the output to make sure it's not doing anything stupid.

Plus it's C so it's infinitely more maintainable and way more portable.

The x86 intrinsics are so hard to read because of terrible Wintel Hungarian naming conventions that I think it’s quite clearer to write your SIMD in assembly. It’s usually easy enough to follow asm if there aren’t complicated memory accesses anyway. The major issue is not having good enough debug info.

  • I honestly don't think I've seen native windows code in over 20 years at this point. Obviously there's a ton of C++ out there, it's just basically as far away from me as possible.

But this seems to be an edge case where you have to rely on functional programming and experimental compiler flags to get the machine code that you want.

Portability is typically not a big issue, because you can have a fallback C++ implementation.