Comment by aidenn0

1 year ago

I don't know what is state of the art today, but historically compilers are terribly inefficient for inline assembly because they inhibit optimizations around inline assembly, so inline asm is often slower than intrinsics. For DSP code, your performance critical code is often a large number of iterations through a hot loop, so the function-call overhead incurred by calling your assembly function is negligible.

MSVC doesn't even support inline assembly anymore, so to be portable across the big three compilers you have to use either intrinsics or standalone assembly.