Comment by flohofwoe
11 days ago
> Render Passes are entirely pointless complexity that should not exist. It's even optional in Vulkan nowadays.
AFAIK Vulkan only eliminated pre-baked render pass objects (which were indeed pointless), and now simply copied Metal's design of transient render passes, e.g. there's still 'render pass boundaries' between vkCmdBeginRendering() and vkCmdEndRendering() and the VkRenderingInfo struct that's passed into the vkCmdBeginRendering() function (https://registry.khronos.org/vulkan/specs/latest/man/html/Vk...) is equivalent with Metal's MTLRenderPassDescriptor (https://developer.apple.com/documentation/metal/mtlrenderpas...).
E.g. even modern Vulkan still has render passes, they just didn't want to call those new functions 'Begin/EndRenderPass' for some reason ;) AFAIK the idea of render pass boundaries is quite essential for tiler GPUs.
WebGPU pretty much tries to copy Metal's render pass approach as much as possible (e.g. it doesn't have pre-baked pass objects like Vulkan 1.0).
> The one thing that WebGPU is doing better is that it does implicit syncing by default.
AFAIK also mostly thanks to the 'transient render pass model'.
> Why would I need that anyway, the shader/kernel knows all about the data, the host doesnt need to know.
Because old GPUs are a thing and those usually don't have such a flexible hardware design to make rasterizing (or even vertex pulling) in compute shaders performant enough to compete with the traditional render pipeline.
> Similarly static binding groups are entirely pointless
I agree, but AFAIK Vulkan's 1.0 descriptor model is mostly to blame for the inflexible BindGroups design.
> but that's also made needlessly cumbersome in WebGPU due to the requirement to use staging buffers
Most modern 3D APIs also switched to staging buffers though, and I guess there's not much choice if you don't have unified memory.
> AFAIK the idea of render pass boundaries is quite essential for tiler GPUs.
I've been told by a driver dev of a tiler GPU that they are, in fact, not essential. They pick that info up by themselves by analyzing the command buffer.
> Most modern 3D APIs also switched to staging buffers though, and I guess there's not much choice if you don't have unified memory.
Well I wouldn't know since I switched to using Cuda as a graphics API. It's mostly nonsense-free, and faster than the hardware pipeline for points, and about as fast for splats. Seeing how Nanite also software-rasterizes as a performance improvement, Cuda may even be great for triangles. Only implemented a rudimentary triangle rasterizer that can draw 10 million small textured triangles per millisecond. Still working on the larger ones, but low-priority since I focus on point clouds.
In any case, I won't touch graphics APIs anymore until they make a clean break to remove the legacy nonsense. Allocating buffers should be a single line, providing data to shaders should be as simple as passing pointers, etc..