Comment by an-unknown
2 days ago
I think there is some confusion about "ubershaders" in the context of emulators in particular. Old Nintendo consoles like the N64 or the GameCube/Wii didn't have programmable shaders. Instead, it was a mostly fixed-function pipeline but you could configure some stages of it to kind of somewhat fake "programmable" shaders with this configurable pipeline, at least to some degree. Now the problem is, you have no idea what any particular game is going to do, right until the moment it writes a specific configuration value into a specific GPU register, which instantly configures the GPU to do whatever the game wants it to do from that very moment onwards. There literally is no "shader" stored in the ROM, it's just code configuring (parts of) the GPU directly.
That's not how any modern GPU works though. Instead, you have to emulate this semi-fixed-function pipeline with shaders. Emulators try to generate shader code for the current GPU configuration and compile it, but that takes time and can only be done after the configuration was observed for the first time. This is where "Ubershaders" enter the scene: they are a single huge shader which implements the complete configurable semi-fixed-function pipeline, so you pass in the configuration registers to the shader and it acts accordingly. Unfortunately, such shaders are huge and slow, so you don't want to use them unless it's necessary. The idea is then to prepare "ubershaders" as fallback, use them whenever you see a new configuration, compile the real shader and cache it, and use the compiled shader once it's available instead of the ubershader, to improve performance again. A few years ago, the developers of the Dolphin emulator (GameCube/Wii) wrote an extensive blog post about how this works: https://de.dolphin-emu.org/blog/2017/07/30/ubershaders/
Only starting with the 3DS/Wii U, Nintendo consoles finally got "real" programmable shaders, in which case you "just" have to translate them to whatever you need for your host system. You still won't know which shaders you'll see until you observe the transfer of the compiled shader code to the emulated GPU. After all, the shader code is compiled ahead of time to GPU instructions, usually during the build process of the game itself. At least for Nintendo consoles, there are SDK tools to do this. This, of course, means, there is no compilation happening on the console itself, so there is no stutter caused by shader compilation either. Unlike in an emulation of such a console, which has to translate and recompile such shaders on the fly.
> How come this was never a problem for older [...] emulators?
Older emulators had highly inaccurate and/or slow GPU emulation, so this was not really a problem for a long time. Only once the GPU emulations became accurate enough with dynamically generated shaders for high performance, the shader compilation stutters became a real problem.
> Old Nintendo consoles like the N64 or the GameCube/Wii didn't have programmable shaders.
The N64 did in fact have a fully programmable pipeline. [1] At boot, the game initialized the RSP (the N64’s GPU) with “microcode”, which was a program that implemented the RSP’s graphics pipeline. During gameplay, the game uploaded “display lists” of opcodes to the GPU which the microcode interpreted. (I misspoke earlier by referring to these opcodes as “microcode”.) For most of the console’s lifespan, game developers chose between two families of vendor-authored microcode: Fast3D and Turbo3D. Toward the end, some developers (notably Factor5) wrote their own microcode.
[1]: https://www.copetti.org/writings/consoles/nintendo-64/
Microcode was only used for the RSP, which was a modified MIPS coprocessor and could only realistically be used for T&L. After that, the RSP then sends triangles to the RDP for rasterization, pixel pipeline, and blending, all of which are fixed-function, admittedly with some somewhat flexible color combiner stuff.
I appreciate the correction. Still, programmable T&L was kind of a big deal. PC GPUs didn’t get hardware T&L until the DX7 era, and it didn’t really become programmable until DX9/Shader Model 2.0.