Comment by zamalek

5 hours ago

> Another reason is compilation time. The more complicated the function signatures, the more prologue/epilogue code we have to generate that LLVM has to chew on. [...]

I know that LLVM completely dominates compilation time, but maybe the improvements from this could make the other bits (i.e. compiling rustc with callconv=fast) fast enough to make up the difference?

1 comment

zamalek

jcranmer 4 hours ago

I am extremely skeptical that that would be the case. Local stack accesses are pretty guaranteed to be L1 cache hits, and if any memory access can be made fast, it's accesses to the local stack. The general rule of thumb for performance engineering is that you're optimizing for L2 cache misses if you can't fit in L2 cache, so overall, I'd be shocked if this convoluted calling convention could eke out more than a few percent improvement, and even 1% I'm skeptical of. Meanwhile, making 14-argument functions is going to create a lot of extra work in several places for LLVM that I can think of (for starters, most of the SmallVector<Value *, N> handling is choosing 4 or 8 for N, so there's going to be a lot of heap-allocate a 14-element array going on), which will more than eat up the gains you'd be expecting.