← Back to context

Comment by Animats

8 months ago

> Bindless is pretty much _the_ most important feature we need in WebGPU. Other stuff can be worked around to varying degrees of success, but lack of bindless makes our state changes extremely frequent, which heavily kills performance with how expensive WebGPU makes changing state.

Yes.

This has had a devastating effect on Rust 3D graphics. The main crate for doing 3D graphics in Rust is WGPU. WGPU supports not just WebGPU, but Android, Vulkan, Metal, Direct-X 12, and OpenGL. It makes them all look much like Vulkan. Bevy, Rend3, and Renderling, the next level up, all use WGPU. It's so convenient.

WGPU has lowest common denominator support. If WebGPU can't do something inside a browser, then WGPU probably can't do it on other platforms which could handle it. So WGPU makes your gamer PC perform like a browser or a phone. No bindless, no multiple queues, and somewhat inefficient binding and allocation.

This is one reason we don't see high-performance games written in Rust.

After four years of development, WGPU performance has gone down, not up. When it dropped 21% recently and I pointed that out, some people were very annoyed.[1]

Google pushing bindless forward might help get this unstuck. Although notice that the target date on their whiteboard is December 2026. I'm not sure that game dev in Rust has that much runway left. Three major projects have been cancelled and the main site for Rust game dev stopped updating in June 2024.[2]

[1] https://github.com/gfx-rs/wgpu/issues/6434

[2] https://gamedev.rs/

> This is one reason we don't see high-performance games written in Rust.

Rendering is _hard_, and Rust is an uncommon toolchain in the gamedev industry. I don't think wgpu has much to do with it. Vulkan via ash and DirectX12 via windows-rs are both great options in Rust.

> After four years of development, WGPU performance has gone down, not up. When it dropped 21% recently and I pointed that out, some people were very annoyed.[1]

Performance isn't most of the wgpu maintainer's (who are paid by Mozilla) priority at the moment. Fixing bugs and implementing missing features so that they can ship WebGPU support in Firefox is more important. The other maintainers are volunteers with no obligation besides finding it enjoyable to work on. Performance can always be improved later, but getting working WebGPU support to users so that websites can start targeting it is crucial. The annoyance is that you were rude about it.

> Google pushing bindless forward might help get this unstuck. Although notice that the target date on their whiteboard is December 2026.

The bindless stuff is basically "developers requested it a ton when we asked for feedback on features they wanted (I was one of those people who gave them feedback), and we had some draft proposals from (iirc) 1-2 different people". It's wanted, but there are still major questions to answer. It's not like this is a set thing they've been developing and are preparing to release. All the features listed are just feedback from users and discussion that took place at the WebGPU face to face recently.

  • WGPU dev here. I agree with everything JMS55 says here, but I want to avoid a potential misunderstanding. Performance is definitely a priority for WGPU, the open source project. Much of WGPU's audience is very concerned with performance.

    My team at Mozilla are active contributors to WGPU. For the moment, when we Mozilla engineers are prioritizing our own work, we are focused on compatibility and safety, because that's what we need most urgently for our use case. Once we have shipped WebGPU in Firefox, we will start putting our efforts into other things like performance, developer experience, and so on.

    But WGPU has other contributors with other priorities. For example, WGPU just merged some additions to its nascent ray tracing support. That's not a Mozilla priority, but WGPU took the PR. Similarly for some recent extensions to 64-bit atomics (which I think is used by Bevy for Nanite-like techniques?), and other areas.

    WGPU is an open source project. We at Mozilla contribute to the features we need; other people contribute to what they care about; and the overall direction of the project is determined by what capable contributors put in the time to make happen.

    • > But WGPU has other contributors with other priorities. For example, WGPU just merged some additions to its nascent ray tracing support. That's not a Mozilla priority, but WGPU took the PR. Similarly for some recent extensions to 64-bit atomics (which I think is used by Bevy for Nanite-like techniques?), and other areas.

      Yep! The 64-bit atomic stuff let me implement software rasterization for our Nanite-like renderer - it was a huge win. Same for raytracing, I'm using it to develop a RT DI/GI solution for Bevy. Both were really exciting additions.

      The question of how performant and featureful wgpu is is mostly just a matter of resources in my view. Like with Bevy, it's up to contributors. The unfortunate reality is that if I'm busy working on Bevy, I don't have any time for wgpu. So I'm thankful for the people who _do_ put in time to wgpu, so that I can continue to improve Bevy.

  • > Rendering is _hard_, and Rust is an uncommon toolchain in the gamedev industry. I don't think wgpu has much to do with it. Vulkan via ash and DirectX12 via windows-rs are both great options in Rust.

    Yes. I think I'm beginning to see what's gone wrong with the Rust crates. It's an architectural problem. Vulcano and WGPU try to create a Rust safety perimeter at an API that's basically a wrapper around Vulkan. This may be the wrong boundary for that safety perimeter.

    Moving buffer allocation inside the safety perimeter may eliminate a level of locking and checking. Bindless really brings this out, because somebody has to keep the descriptor table and buffer allocation in sync. The GPU depends on that. So that has safety implications.

    If this problem is partitioned differently, the locking problems for concurrent GPU content updating may become simpler. Right now, both Vulcano and WGPU force more serialization than Vulkan itself requires. The rendering thread is too often stalled on a lock waiting for some content updating operation that should not interfere with rendering.

    Too much detail for this forum. I'll continue this elsewhere. This has been useful.

    • Back in the day I did a similar error with wrapping C graphic libraries directly 1:1 with improved C++ bindings, until I realised it was more ergonomic to think in higher level C++ abstractions, and exposed those concepts instead, fully hiding the underlying unsafe C APIs.

  • > implementing missing features so that they can ship WebGPU support in Firefox

    Sounds like WGPU, the project, should be detached from Firefox?

    To me the priority of shipping WGPU on FF is kind of mind-boggling, as I consider the browser irrelevant at this point in time.

There have been a bunch of significant improvements to WGPU's performance over the last few years.

* Before the major rework called "arcanization", `wgpu_core` used a locking design that caused huge amounts of contention in any multi-threaded program. It took write locks so often I doubt you could get much parallelism at all out of it. That's all been ripped out, and we've been evolving steadily towards a more limited and reasonable locking discipline.

* `wgpu_core` used to have a complex system of "suspected resources" and deferred cleanup, apparently to try to reduce the amount of work that needed to be done when a command buffer finished executing on the GPU. This turned out not to actually save any work at all: it did exactly the same amount of bookkeeping, just at a different time. We ripped out this complexity and got big speedups on some test cases.

* `wgpu_core` used to use Rust generics to generate, essentially, a separate copy of its entire code for each backend (Vulkan, Metal, D3D12) that it used. The idea was that the code generator would be able to see exactly what backend types and functions `wgpu_core` was using, inline stuff, optimize, etc. It also put our build times through the roof. So, to see if we could do something about the build times, Wumpf experimented with making the `wgpu_hal` API use dynamic dispatch instead. For reasons that are not clear to me, switching from generics to dynamic dispatch made WGPU faster --- substantially so on some benchmarks.

Animats posts frequently about performance problems they're running into, but when they do it's always this huge pile of unanalyzed data. It's almost as if, they run into a performance problem with their code, and then rather than figuring out what's going on themselves, they throw their whole app over the wall and ask WGPU to debug the problem. That is just not a service we offer.

  • He's reporting a 23% drop in performance and seems to have invested quite some time in pinning down what's causing it, plus he's provided a repro repository with benchmarks.

    I honestly don't get your annoyed response; any OSS project wishes they had such detailed bug reports, and such a performance regression would concern me very much if it happened in a project I maintain.

  • What? They even provided a benchmarking tool. You should be ecstatic at users providing such detailed reports. Most projects just attract reports that go like "its slow, fix it!!111"

> When it dropped 21% recently and I pointed that out, some people were very annoyed.[1]

Someone was seemingly "annoyed" by an impatient end-user asking for an status update ("It's now next week. Waiting.") and nothing more. They didn't seem to be annoyed about that you pointed out a performance issue, and instead explained that their current focus is elsewhere.

Tbh I was annoyed reading it too as an open source developer. The people you are talking to are volunteering their time, and you weren’t very considerate of that. Open source software isn’t the same support structure as paid software. You don’t file tickets and expect them to be promptly fixed, unless you do the legwork yourself.

Tbf, tons of games have been created and are still being created without bindless resource binding. While WebGPU does have some surprising performance bottlenecks around setBindGroup(), details like that hardly make or break a game (if the devs are somewhat competent they'll come up with ways to workaround 3D API limitations - that's how it's always been and always will be - the old batching tricks from the D3D9 era still sometimes make sense, I wonder if people simply forgot about those or don't know them in the first place because it was before their time).

  • Nobody forgot about batching. It's a foundational strategy in any efficient realtime renderer. The bar has simply moved and even the cheaper binding logic you get from Vulkan or D3D12 is getting too expensive for the object counts we're trying to push in modern games.

    Bindless lets you reduce the amount of book keeping you have to do per-object on the CPU, but much more importantly opens the door for GPU driven rendering.

    The problem with WebGPU is there's no bindless and the 'bindful' path is quite expensive to meet the safety requirements of a browser API. There's no way around the slow path, and the slow path is quite slow. In this case the workaround is cut features because the API simply imposes too much overhead.

    • BindGroups being a hard to fix design wart is true indeed (which I have been complaining about pretty much from the beginning, not because of the performance problems - which surprised me too - but because of their inflexibility compared to a traditional bindslot based model like in Metal1 or D3D11).

      But I would prefer to first bring the peformance of the slot-based binding model to a point where it is similar to D3D11 or Metal instead of ignoring that part of the API and 'skipping ahead' to bindless (which will probably have to be behind an extension anyway). Otherwise WebGPU will become a cemetery of abandondend attempts like OpenGL.

As far as I know, Unity doesn't support bindless either. However thousands of Unity games are released on Steam every year. So it's safe to say performance isn't the main (or major) reason why Rust gamedev isn't getting much traction.

  • The lack of traction is mostly because Rust game development, with exception of Bevy efforts, it is still pretty much on the dark ages of everything is code.

    The industry has moved beyond that, with teams where programmers only have a minor role (quite important nontheless), on the whole game design, with plenty of tooling for designers and other non-programmer folks to do their tasks.

    Eventually with more graphical tooling, or scripting systems, it will start to gain more steam.

    Note that TinyGlade also created most of their tooling in-house, they only partially depend on Bevy.

Another reason is that exploratory programming is hard by design in Rust. Rust is great if you already have a spec and know what needs to be done.

Most of the gamedev in my opinion is extremely exploratory and demands constant experimentation with design. C/C++ offer fluidity, a very good and mature debug toolchain, solid performance ceiling and support from other people.

It will be really hard to replace C++ in performance/simulation contexts. Security takes a backseat there.

Author of Renderling here. Thanks for the shout out Animats!

Bindless is a game changer - pun intended. It can’t happen soon enough.

Just curious, what are the three major projects that were cancelled?

I also want to mention that folks are shipping high performance games in Rust - the first title that comes to mind is “Tiny Glade” which is breathtakingly gorgeous, though it is a casual game. It does not run on wgpu though, to my knowledge. I may have a different definition of high performance, with lower expectations.

  • > What are the three major projects that were cancelled?

    Here are some:

    - LogLog Games [1]. Not happy with Bevy. Not too unhappy about performance, although it's mentioned.

    - Moonlight Coffee [2]. Not a major project, but he got as far as loading glTF and displaying the results, then quit. That's a common place to give up.

    - Hexops. [3] Found Rust "too hard", switched to Zig.

    Tiny Glade is very well done. But, of course, it's a tiny glade. This avoids the scaling problems.

    [1] https://devlog.hexops.com/2021/increasing-my-contribution-to...

    • It's crazy you've cited Hexops as an example:

      1. It's a game studio not a project (CEO here :))

      2. It's very much still alive and well today, not 'cancelled'

      3. We never even used WebGPU in Rust, this was before WebGPU was really a thing.

      It is true that we looked elsewhere for a better language for us with different tradeoffs, and have since fully embraced Zig. It's also true that we were big proponents of WebGPU earlier on, and have in recent years abandoned WebGPU in favor of something which is better for graphics outside the browser (that's its own worthwhile story)..

      But we've never played /any/ role in the Rust gamedev ecosystem, really.

      4 replies →

    • None of those are “major projects” by any definition of the word though. And none of the three has anything to do with wgpu's performance.

      Rust for game engine has always been a highly risky endeavor since the ecosystem is much less mature than everything else, and even though things have improved a ton over the past few years, it's still light-years away from the mainstream tools.

      Building a complete game ecosystem is very hard and it's not surprising to see that Rust is still struggling.

I thought WGPU only supported WebGPU, and then there were translation libraries (akin to Proton) to run WebGPU over Vulkan.

Does it directly, internally, support Vulkan instead of on-the-fly translation from WebGPU to VK?

  • WGPU (https://wgpu.rs/) is one of currently three implementations of the WebGPU specification (the other two being Google's Dawn library used in Chrome, and the implementation in WebKit used in Safari).

    The main purpose of WebGPU is to specify a 3D API over the common subset of Metal/D3D12/Vulkan features (e.g. doing an 'on-the-fly translation' of WebGPU API calls to Metal/D3D12/Vulkan API calls, very similar to how (a part of) Proton does an on-the-fly translation of the various D3D API versions to Vulkan.

    • You're describing the WebGPU spec and its different implementations.

      OP claimed WGPU had native support for VK, DX and others. But as far as I know, WGPU just supports WebGPU being translated on the fly to those other backends, with the obvious performance hit. If I'm wrong, I'd be interested to know, as this would make WGPU a more interesting choice for many if, in reality, the code was native instead of translation.

      Edit: https://docs.rs/wgpu/latest/wgpu/#backends it seems they indeed support native code in almost every backend?

      2 replies →

The tone of the thread was perfectly fine until you made a passive aggressive comment