← Back to context

Comment by CrossVR

2 days ago

To answer that question in the context of Vulkan I highly recommend reading the proposal document for VK_EXT_shader_object which contains a great explanation in the problem statement about how modern graphics APIs ended up in this situation:

https://github.com/KhronosGroup/Vulkan-Docs/blob/main/propos...

The gist of it is that graphics APIs like DX11 were designed around the pipelines being compiled in pieces, each piece representing a different stage of the pipeline. These pieces are then linked together at runtime just before the draw call. However the pieces are rarely a perfect fit requiring the driver to patch them or do further compilation, which can introduce stuttering.

In an attempt to further reduce stuttering and to reduce complexity for the driver Vulkan did away with these piece-meal pipelines and opted for monolithic pipeline objects. This allowed the application to pre-compile the full pipeline ahead of time alleviating the driver from having to piece the pipeline together at the last moment.

If implemented correctly you can make a game with virtually no stuttering. DOOM (2016) is a good example where the number of pipeline variants was kept low so it could all be pre-compiled and its gameplay greatly benefits from the stutter-free experience.

This works great for a highly specialized engine with a manageable number of pipeline variants, but for more versatile game engines and for most emulators pre-compiling all pipelines is untenable, the number of permutations between the different variations of each pipeline stage is simply too great. For these applications there was no other option than to compile the full pipeline on-demand and cache the result, making the stutter worse than before since there is no ability to do piece-meal compilation of the pipeline ahead of time.

This gets even worse for emulators that attempt to emulate systems where the pipeline is implemented in fixed-function hardware rather than programmable shaders. On those systems the games don't compile any piece of the pipeline, the game simply writes to a few registers to set the pipeline state right before the draw call. Even piece-meal compilation won't help much here, thus ubershaders were used instead to emulate a great number of hardware states in a single pipeline.