Comment by kimixa
19 hours ago
Having worked on some of the latter part of that era of GPUs, the "frontend" of the shader compiler was a pretty small fraction of the total time cost, most of it was in the later optimization passes that often extremely hardware specific (so not really possible at the level of DXBC). Especially as hardware started to move away from the assumptions used in designing it.
I think a big part of the user-visible difference in stutter is simply the expected complexity of shaders and number of different shaders in an "average" scene - they're 100s of times larger, and CPUs aren't 100s of times faster (and many of the optimization algorithms used are more-than-linear in terms of time vs the input too)
Modern DXIL and SPIR-V are at a similar level of abstraction to DXBC, and certainly don't "solve" stutter.
One advantage of contemporary bytecode implementations is that many optimizations can occur in the “middle end”—which is to say on the IR itself, before lowering to ISA.
Yes, many optimizations can be done at the vendor-neutral IR level, but my point is that on GPUs they tend to be some of the computationally less expensive ones - the vast majority of the compilers time (in my experience) was in levels lower than that, like register allocation (as on GPUs "registers" are normally shared for all waves - so there's trade offs in using fewer registers but allowing more waves, for example), or trying to reorder things to hide latency from asynchronous units or higher latency instructions. And all those are very hardware specific.
It's a classic example of the "first 50%" being relatively easy - like an "optimizing" compiler can get pretty good with pretty simple constant propagation/inlining/dead code elimination. But that second 50% takes so much more effort.