Comment by CyberDildonics
1 year ago
You might see a 10x difference if you compare meticulously optimized assembly to naive C in cases where vectorization is possible but the compiler fails to capitalize on that,
I can get far more than 10x over naive C just by reordering memory accesses. With SIMD it can be 7x more, but that can be done with ISPC, it doesn't need to be done with asm.
> I can get far more than 10x over naive C
However you can write better than naive C by compiling and watching the compiler output.
I stopped writing assembly back around y2k as I was fairly consistently getting beaten by the compiler when I wrote compiler-friendly high-level code. Memory organization is also something you can control fairly well on the high-level code side too.
Sure some niches remained, but for my projects the gains were very modest compared to invested time.