Comment by brigade

1 year ago

Well that was more true when you had to care about the 8 registers of x86, CPUs were only like 2-4 wide, and codecs preferred to operate on 8x8 blocks and one bitdepth.

Nowadays the impact of suboptimal register allocation and addressing calculations of compilers is almost unmeasurable between having 16/32 registers available and CPUs that are 8-10 wide in the frontend but only 3-4 vector units in the backend. But the added complexity of newer codecs has strained their use of the nasm/gas macro systems to be far less readable or maintainable than intrinsics. Like, think of how unmaintainable complex C macros are and double that.

And it's not uncommon to find asm in ffmpeg or related projects written suboptimally in a way a compiler wouldn't, either because the author didn't fully read/understand CPU performance manuals or because rewriting/twisting the existing macros to fix a small suboptimality is more work than it's worth.

(yes, I have written some asm for ffmpeg in the past)