← Back to context

Comment by janwas

1 year ago

I'm curious why there are even function calls in time-critical code, shouldn't just about everything be inlined there? And if it's not time-critical, why are we interested in the savings from a custom calling convention?

Binary size was a concern, so excessive inlining was undesirable.

And don't forget that any asm-optimized variant always has a C fallback for generic platforms lacking a hand-optimized variant which is also used to verify the asm-optimized variant using checkasm. This might not be linked into your binary/library (the linker eliminated it because it's never used), but the code exists nonetheless.

  • hm, fair enough. IIRC JPEG XL was a few hundred KB of SIMD code for the four or so different targets/ISAs, including the generic fallback, but I can believe video codecs are larger.

Function calls are very fast (unless there's really a lot of parameter copying/saving-to-stack) and if you can re-use a chunk of code from multiple places, you'll reduce pressure on the instruction cache. Inlining is not always ideal.

  • Perhaps the use cases are different (heavily data-parallel), but FWIW I do not remember many cases where we were frontend bound, so icache hasn't been a concern.

Codecs often have many redundant ways of doing the same thing, which are chosen on the basis of which one uses the fewest bits, for a specific piece of data. So you can't inline them as you don't know ahead of time which will be used.