Comment by jms55

8 months ago

Bindless is pretty much _the_ most important feature we need in WebGPU. Other stuff can be worked around to varying degrees of success, but lack of bindless makes our state changes extremely frequent, which heavily kills performance with how expensive WebGPU makes changing state. The default texture limits without bindless are also way too small for serious applications - just implementing the glTF PBR spec + extensions will blow past them.

I'm really looking forward to getting bindless later down the road, although I expect it to take quite a while.

By the same token, I'm quite surprised that effort is being put into a compatibility mode, when WebGPU is already too old and limiting for a lot of people, and when WebGL(2) is going to have to be maintained by browsers anyways.

76 comments

jms55

Animats 8 months ago

> Bindless is pretty much _the_ most important feature we need in WebGPU. Other stuff can be worked around to varying degrees of success, but lack of bindless makes our state changes extremely frequent, which heavily kills performance with how expensive WebGPU makes changing state.

Yes.

This has had a devastating effect on Rust 3D graphics. The main crate for doing 3D graphics in Rust is WGPU. WGPU supports not just WebGPU, but Android, Vulkan, Metal, Direct-X 12, and OpenGL. It makes them all look much like Vulkan. Bevy, Rend3, and Renderling, the next level up, all use WGPU. It's so convenient.

WGPU has lowest common denominator support. If WebGPU can't do something inside a browser, then WGPU probably can't do it on other platforms which could handle it. So WGPU makes your gamer PC perform like a browser or a phone. No bindless, no multiple queues, and somewhat inefficient binding and allocation.

This is one reason we don't see high-performance games written in Rust.

After four years of development, WGPU performance has gone down, not up. When it dropped 21% recently and I pointed that out, some people were very annoyed.[1]

Google pushing bindless forward might help get this unstuck. Although notice that the target date on their whiteboard is December 2026. I'm not sure that game dev in Rust has that much runway left. Three major projects have been cancelled and the main site for Rust game dev stopped updating in June 2024.[2]

[1] https://github.com/gfx-rs/wgpu/issues/6434

[2] https://gamedev.rs/

jms55 8 months ago
> This is one reason we don't see high-performance games written in Rust.
Rendering is _hard_, and Rust is an uncommon toolchain in the gamedev industry. I don't think wgpu has much to do with it. Vulkan via ash and DirectX12 via windows-rs are both great options in Rust.
> After four years of development, WGPU performance has gone down, not up. When it dropped 21% recently and I pointed that out, some people were very annoyed.[1]
Performance isn't most of the wgpu maintainer's (who are paid by Mozilla) priority at the moment. Fixing bugs and implementing missing features so that they can ship WebGPU support in Firefox is more important. The other maintainers are volunteers with no obligation besides finding it enjoyable to work on. Performance can always be improved later, but getting working WebGPU support to users so that websites can start targeting it is crucial. The annoyance is that you were rude about it.
> Google pushing bindless forward might help get this unstuck. Although notice that the target date on their whiteboard is December 2026.
The bindless stuff is basically "developers requested it a ton when we asked for feedback on features they wanted (I was one of those people who gave them feedback), and we had some draft proposals from (iirc) 1-2 different people". It's wanted, but there are still major questions to answer. It's not like this is a set thing they've been developing and are preparing to release. All the features listed are just feedback from users and discussion that took place at the WebGPU face to face recently.
- jblandy 8 months ago
  
  WGPU dev here. I agree with everything JMS55 says here, but I want to avoid a potential misunderstanding. Performance is definitely a priority for WGPU, the open source project. Much of WGPU's audience is very concerned with performance.
  My team at Mozilla are active contributors to WGPU. For the moment, when we Mozilla engineers are prioritizing our own work, we are focused on compatibility and safety, because that's what we need most urgently for our use case. Once we have shipped WebGPU in Firefox, we will start putting our efforts into other things like performance, developer experience, and so on.
  But WGPU has other contributors with other priorities. For example, WGPU just merged some additions to its nascent ray tracing support. That's not a Mozilla priority, but WGPU took the PR. Similarly for some recent extensions to 64-bit atomics (which I think is used by Bevy for Nanite-like techniques?), and other areas.
  WGPU is an open source project. We at Mozilla contribute to the features we need; other people contribute to what they care about; and the overall direction of the project is determined by what capable contributors put in the time to make happen.
  
  1 reply →
- Animats 8 months ago
  
  > Rendering is _hard_, and Rust is an uncommon toolchain in the gamedev industry. I don't think wgpu has much to do with it. Vulkan via ash and DirectX12 via windows-rs are both great options in Rust.
  Yes. I think I'm beginning to see what's gone wrong with the Rust crates. It's an architectural problem. Vulcano and WGPU try to create a Rust safety perimeter at an API that's basically a wrapper around Vulkan. This may be the wrong boundary for that safety perimeter.
  Moving buffer allocation inside the safety perimeter may eliminate a level of locking and checking. Bindless really brings this out, because somebody has to keep the descriptor table and buffer allocation in sync. The GPU depends on that. So that has safety implications.
  If this problem is partitioned differently, the locking problems for concurrent GPU content updating may become simpler. Right now, both Vulcano and WGPU force more serialization than Vulkan itself requires. The rendering thread is too often stalled on a lock waiting for some content updating operation that should not interfere with rendering.
  Too much detail for this forum. I'll continue this elsewhere. This has been useful.
  
  2 replies →
- kookamamie 8 months ago
  
  > implementing missing features so that they can ship WebGPU support in Firefox
  Sounds like WGPU, the project, should be detached from Firefox?
  To me the priority of shipping WGPU on FF is kind of mind-boggling, as I consider the browser irrelevant at this point in time.
  
  11 replies →
jblandy 8 months ago
There have been a bunch of significant improvements to WGPU's performance over the last few years.
* Before the major rework called "arcanization", `wgpu_core` used a locking design that caused huge amounts of contention in any multi-threaded program. It took write locks so often I doubt you could get much parallelism at all out of it. That's all been ripped out, and we've been evolving steadily towards a more limited and reasonable locking discipline.
* `wgpu_core` used to have a complex system of "suspected resources" and deferred cleanup, apparently to try to reduce the amount of work that needed to be done when a command buffer finished executing on the GPU. This turned out not to actually save any work at all: it did exactly the same amount of bookkeeping, just at a different time. We ripped out this complexity and got big speedups on some test cases.
* `wgpu_core` used to use Rust generics to generate, essentially, a separate copy of its entire code for each backend (Vulkan, Metal, D3D12) that it used. The idea was that the code generator would be able to see exactly what backend types and functions `wgpu_core` was using, inline stuff, optimize, etc. It also put our build times through the roof. So, to see if we could do something about the build times, Wumpf experimented with making the `wgpu_hal` API use dynamic dispatch instead. For reasons that are not clear to me, switching from generics to dynamic dispatch made WGPU faster --- substantially so on some benchmarks.
Animats posts frequently about performance problems they're running into, but when they do it's always this huge pile of unanalyzed data. It's almost as if, they run into a performance problem with their code, and then rather than figuring out what's going on themselves, they throw their whole app over the wall and ask WGPU to debug the problem. That is just not a service we offer.
- ossobuco 8 months ago
  
  He's reporting a 23% drop in performance and seems to have invested quite some time in pinning down what's causing it, plus he's provided a repro repository with benchmarks.
  I honestly don't get your annoyed response; any OSS project wishes they had such detailed bug reports, and such a performance regression would concern me very much if it happened in a project I maintain.
- Animats 8 months ago
  
  This is in reference to [1].
  [1] https://github.com/gfx-rs/wgpu/issues/6434
- jillyboel 8 months ago
  
  What? They even provided a benchmarking tool. You should be ecstatic at users providing such detailed reports. Most projects just attract reports that go like "its slow, fix it!!111"
diggan 8 months ago

> When it dropped 21% recently and I pointed that out, some people were very annoyed.[1]
Someone was seemingly "annoyed" by an impatient end-user asking for an status update ("It's now next week. Waiting.") and nothing more. They didn't seem to be annoyed about that you pointed out a performance issue, and instead explained that their current focus is elsewhere.
adastra22 8 months ago

Tbh I was annoyed reading it too as an open source developer. The people you are talking to are volunteering their time, and you weren’t very considerate of that. Open source software isn’t the same support structure as paid software. You don’t file tickets and expect them to be promptly fixed, unless you do the legwork yourself.
flohofwoe 8 months ago
Tbf, tons of games have been created and are still being created without bindless resource binding. While WebGPU does have some surprising performance bottlenecks around setBindGroup(), details like that hardly make or break a game (if the devs are somewhat competent they'll come up with ways to workaround 3D API limitations - that's how it's always been and always will be - the old batching tricks from the D3D9 era still sometimes make sense, I wonder if people simply forgot about those or don't know them in the first place because it was before their time).
- MindSpunk 8 months ago
  
  Nobody forgot about batching. It's a foundational strategy in any efficient realtime renderer. The bar has simply moved and even the cheaper binding logic you get from Vulkan or D3D12 is getting too expensive for the object counts we're trying to push in modern games.
  Bindless lets you reduce the amount of book keeping you have to do per-object on the CPU, but much more importantly opens the door for GPU driven rendering.
  The problem with WebGPU is there's no bindless and the 'bindful' path is quite expensive to meet the safety requirements of a browser API. There's no way around the slow path, and the slow path is quite slow. In this case the workaround is cut features because the API simply imposes too much overhead.
  
  1 reply →
raincole 8 months ago
As far as I know, Unity doesn't support bindless either. However thousands of Unity games are released on Steam every year. So it's safe to say performance isn't the main (or major) reason why Rust gamedev isn't getting much traction.
- Animats 8 months ago
  
  That limits Unity's scene size. See [1].
  [1] https://discussions.unity.com/t/gpu-bindless-resources-suppo...
  
  1 reply →
- pjmlp 8 months ago
  
  The lack of traction is mostly because Rust game development, with exception of Bevy efforts, it is still pretty much on the dark ages of everything is code.
  The industry has moved beyond that, with teams where programmers only have a minor role (quite important nontheless), on the whole game design, with plenty of tooling for designers and other non-programmer folks to do their tasks.
  Eventually with more graphical tooling, or scripting systems, it will start to gain more steam.
  Note that TinyGlade also created most of their tooling in-house, they only partially depend on Bevy.
z3phyr 8 months ago

Another reason is that exploratory programming is hard by design in Rust. Rust is great if you already have a spec and know what needs to be done.
Most of the gamedev in my opinion is extremely exploratory and demands constant experimentation with design. C/C++ offer fluidity, a very good and mature debug toolchain, solid performance ceiling and support from other people.
It will be really hard to replace C++ in performance/simulation contexts. Security takes a backseat there.
efnx 8 months ago
Author of Renderling here. Thanks for the shout out Animats!
Bindless is a game changer - pun intended. It can’t happen soon enough.
Just curious, what are the three major projects that were cancelled?
I also want to mention that folks are shipping high performance games in Rust - the first title that comes to mind is “Tiny Glade” which is breathtakingly gorgeous, though it is a casual game. It does not run on wgpu though, to my knowledge. I may have a different definition of high performance, with lower expectations.
- Animats 8 months ago
  
  > What are the three major projects that were cancelled?
  Here are some:
  - LogLog Games [1]. Not happy with Bevy. Not too unhappy about performance, although it's mentioned.
  - Moonlight Coffee [2]. Not a major project, but he got as far as loading glTF and displaying the results, then quit. That's a common place to give up.
  - Hexops. [3] Found Rust "too hard", switched to Zig.
  Tiny Glade is very well done. But, of course, it's a tiny glade. This avoids the scaling problems.
  [1] https://devlog.hexops.com/2021/increasing-my-contribution-to...
  
  8 replies →
ladyanita22 8 months ago
I thought WGPU only supported WebGPU, and then there were translation libraries (akin to Proton) to run WebGPU over Vulkan.
Does it directly, internally, support Vulkan instead of on-the-fly translation from WebGPU to VK?
- flohofwoe 8 months ago
  
  WGPU (https://wgpu.rs/) is one of currently three implementations of the WebGPU specification (the other two being Google's Dawn library used in Chrome, and the implementation in WebKit used in Safari).
  The main purpose of WebGPU is to specify a 3D API over the common subset of Metal/D3D12/Vulkan features (e.g. doing an 'on-the-fly translation' of WebGPU API calls to Metal/D3D12/Vulkan API calls, very similar to how (a part of) Proton does an on-the-fly translation of the various D3D API versions to Vulkan.
  
  15 replies →
klysm 8 months ago

The tone of the thread was perfectly fine until you made a passive aggressive comment

nox101 8 months ago

> The default texture limits without bindless are also way too small for serious applications

I'm not disagreeing that bindless is needed but it's a bit of hyperbole to claim the texture limits are too small for serious applications given the large list of serious graphics applications that shipped before bindless existed and the large number of serious graphics applications and games still shipping that don't use them.

jms55 8 months ago

It's partly because WebGPU has very conservative default texture limits so that they can support old mobile devices, and partly it's a problem for engines that may have a bunch of different bindings and have increasingly hacky workarounds to compile different variants with only the enabled features so that you don't blow past texture limits.
For an idea of bevy's default view and PBR material bindings, see:
* https://github.com/bevyengine/bevy/blob/main/crates/bevy_pbr...
* https://github.com/bevyengine/bevy/blob/main/crates/bevy_pbr...
elabajaba 8 months ago

They're talking about the 16 sampled texture binding limit which is the same as webgl2. If you look at eg. the list of devices that are stuck with that few texture bindings they don't even support basic GL with compute shaders or vulkan, so they can't even run webgpu in the first place.
Animats 8 months ago
Yes. If you're stuck with that limitation, you pack up related textures into a big texture atlas. When you enter a new area, the player sees "Loading..." while the next batch of content is loaded. That was the state of the art 15 years ago. It's kind of dated now.
- lpghatguy 8 months ago
  
  You might be getting “sampled textures in a single call” with “total textures loaded” mixed up. Sampled texture limits affect complexity of your shader and have nothing to do with loading content from elsewhere.

ribit 8 months ago

Quick note: I looked at the bindless proposal linked from the blog post and their description of Metal is quite outdated. MTLArgumentEncoder has been deprecated for a while now, the layout is a transparent C struct that you populate at will with GPU addresses. There are still descriptors for textures and samplers, but these are hidden from the user (the API will maintain internal tables). It's a very convenient model and probably the simplest and most flexible of all current APIs. I'd love to see something similar for WebGPU.

jblandy 8 months ago

The nice thing about WebGPU's "compat mode" is that it's designed so browsers don't have to implement it if they don't want to. Chrome is really excited about it; Safari has no plans to implement it, ever.

I agree that compat mode takes up more of the WebGPU standard committee's time than bindless. I'm not sure that's how I would prioritize things. (As a Mozilla engineer, we have more than enough implementation work to do already, so what the committee discusses is sort of beside the point for us...)

What would be really helpful is if, once the bindless proposal <https://hackmd.io/PCwnjLyVSqmLfTRSqH0viA?view> gets merged into the spec repo <https://github.com/gpuweb/gpuweb/tree/main/proposals>, a contributor could start adapting what WGPU has now to match the proposal. Implementation experience would be incredibly valuable feedback for the committee.

modeless 8 months ago

You don't have to settle for the default limits. Simply request more.

jms55 8 months ago
We do when there available, but I think the way browsers implement limit bucketing (to combat fingerprinting) means that some users ran into the limit.
I never personally ran into the issue, but I know it's a problem our users have had.
- modeless 8 months ago
  
  That makes sense. I bet the WebGPU WG would be interested in hearing about that experience. They might be able to make changes to the buckets.
adastra22 8 months ago

Yeah I went down the rabbit hole of trying to rewrite all our shaders to work on webgpu’s crazy low limits. I’m embarrassed to say how long I worked that problem until I tried requesting higher limits, and it worked on every device we were targeting.
The default limits are like the lowest common denominator and typically way lower than what the device actually supports.

pjmlp 8 months ago

It only goes to show the limitations of browser 3D APIs, and the huge mistake some folks do for native games using it instead of a proper middleware engines, capable of exposing modern hardware.

jms55 8 months ago
I don't necessarily disagree. But I don't agree either. WebGPU has given us as many positives as it has negatives. A lot of our user base is not on modern hardware, as much as other users are.
Part of the challenge of making a general purpose engine is that we can't make choices that specialize to a use case like that. We need to support all the backends, all the rendering features, all the tradeoffs, so that our users don't have to. It's a hard challenge.
- pjmlp 8 months ago
  
  Basically the goal of any middleware engine, since the dawn of time in the games industry.