Comment by brigade
1 year ago
Well that was more true when you had to care about the 8 registers of x86, CPUs were only like 2-4 wide, and codecs preferred to operate on 8x8 blocks and one bitdepth.
Nowadays the impact of suboptimal register allocation and addressing calculations of compilers is almost unmeasurable between having 16/32 registers available and CPUs that are 8-10 wide in the frontend but only 3-4 vector units in the backend. But the added complexity of newer codecs has strained their use of the nasm/gas macro systems to be far less readable or maintainable than intrinsics. Like, think of how unmaintainable complex C macros are and double that.
And it's not uncommon to find asm in ffmpeg or related projects written suboptimally in a way a compiler wouldn't, either because the author didn't fully read/understand CPU performance manuals or because rewriting/twisting the existing macros to fix a small suboptimality is more work than it's worth.
(yes, I have written some asm for ffmpeg in the past)
No comments yet
Contribute on Hacker News ↗