← Back to context

Comment by fch42

2 years ago

"sucks" is a strong word but with respect to return values, you're right. The C calling conventions, everywhere really, support what C supports - returning one argument. Well, not even that (struct returns ... nope). Kind of "who'd have thought" in C I guess. And then there's the C++ argument "just make it inline then".

On the other hand, memory spills happen. For SPARC, for example, the gracious register space (windows) ended up with lots of unused regs in simple functions and a cache-busting huge stack size footprint, definitely so if you ever spilled the register ring. Even with all the mov in x86 (and there is always lots of it, at least in compiled C code) to rearrange data to "where it needed to be", it often ended up faster.

When you only look at the callee code (code generated for a given function signature), it's tempting to say "oh it'll definitely be fastest if this arg is here and that return there". You don't know the callers though. There's no guarantee the argument marshalling will end up "pass through" or the returns are "hot" consumed. Say, a struct Point { x: i32, y: i32, z: i32 } as arg/return; if the caller does something like mystruct.deepinside.point[i] = func(mystruct.deepinside.point[i]) in a loop then moving it in/out of regs may be overhead or even prevent vectorisation. But the callee cannot know. Unless... the compiler can see both and inline (back to the C++ excuse). Yes, for function call chaining javascript/rust style it might be nice/useful "in principle". But in practice only if the compiler has enough caller/callee insight to keep the hot working set "passthrough" (no spills).

The lowest hanging fruit on calling is probably to remove the "functions return one primitive thing" that's ingrained in the C ABIs almost everywhere. For the rest ? A lot of benchmarking and code generation statistics. I'd love to see more of that. Even if it's dry stuff.

> Well, not even that (struct returns ... nope).

C compilers actually pack small struct return values into registers:

https://godbolt.org/z/51q5se86s

It's just limited that on x86-64, GCC and Clang use up to two registers while MSVC only uses one.

Also, IMHO there is no such thing as a "C calling convention", there are many different calling conventions that are defined by the various runtime environments (usually the combination of CPU architecture and operating system). C compilers just must adhere to those CPU+OS calling conventions like any other language that wants to interact directly with the operating system.

IMHO the whole performance angle is a bit overblown though, for 'high frequency functions' the compiler should inline the function body anyway. And for situations where that's not possible (e.g. calling into DLLs), the DLL should expose an API that doesn't require such 'high frequency functions' in the first place.

  • > Also, IMHO there is no such thing as a "C calling convention", there are many different calling conventions [ ... ]

    I did not say that. I said "C calling conventions" (plural). Rather aware of the fact that the devil is in the detail here ... heck, if you want it all, back in the bad old days, even the same compiler supported/used multiple ("fastcall" & Co, or on Win 3.x "pascal" for system interfaces, or the various ARM ABIs, ...).

    • Clang still has some alternative calling conventions via __attribute__((X)) for individual functions with a bunch of options[0], though none just extend the set of arguments passed via GPRs (closest seems to be preserve_none with 12 arguments passed by register, but it also unconditionally gets rid of all callee-saved registers; preserve_most is nice for rarely-taken paths, though until clang-17 it was broken on functions which returned things).

      [0]: https://clang.llvm.org/docs/AttributeReference.html#calling-...