← Back to context

Comment by flohofwoe

1 month ago

First: very nice project, kudos! But:

> It’s way more efficient, I ran a benchmark rendering 10k rectangles on a canvas and the difference was huge: Emscripten hit around 40 FPS, while my setup hit 100 FPS.

This sounds a bit suspicious tbh.

For instance in this Emscripten WebGL2 sample I can move the instance slider to about 25000 before the frame rate drops below 120 fps on my 2021 M1 MBP in Chrome:

https://floooh.github.io/sokol-html5/drawcallperf-sapp.html

Each 'instance draw' is doing one glUniform4iv and one glDrawElements call via Emscripten's regular WebGL2 shim, e.g. 50k calls across the WASM/JS boundary per 120Hz frame, and I'm pretty sure the vast bulk of the execution time is inside the WebGL2 implementation and the actual call overhead from WASM to JS is negligible (also see: https://hacks.mozilla.org/2018/10/calls-between-javascript-a... - e.g. wasm-to-js call overhead is in the nanoseconds area since around 2018).

Still, very cool idea to have this command batching, but I doubt that the performance improvement can be explained with the WASM-JS call overhead alone, there must be something else going on (maybe it's as simple as the command buffer approach being more cache-friendly, or the tight decoding loop on the JS side allowing more JIT optimizations by the JS engine - but the differences you saw are still baffling, because IME 10k fairly simple operations in 25 or 10 milliseconds (e.g. 40 or 100fps) are not enough too see much of a difference by CPU caches or inefficient JIT code generation, and by far most of the time should be spent inside the WebGL2 implementation, and that should be the same no matter if a traditional shim or the command buffer approach is used).

Thanks for the feedback! You're absolutely right to question this.

Just to clarify, my benchmark was using Canvas2D, not WebGL, that's why the numbers are much lower than your WebGL2 example. Based on your comment I actually removed the command batching to test the difference, and yeah, the batching optimization is smaller than I initially thought. WebCC with batched commands hits ~100 FPS, without batching it's ~86 FPS, and Emscripten is ~40 FPS. So the batching itself only contributes about ~14 FPS.

The bigger performance difference compared to Emscripten seems to come from how Canvas2D operations are handled. Emscripten uses their val class for JS interop which wraps each canvas call in their abstraction layer. WebCC writes raw commands (opcode + arguments) directly into a buffer that the JS side decodes with a tight switch statement. The JS decoder already has direct references to canvas objects and can call methods immediately without property lookups or wrapper overhead. With 10k draw calls per frame, these small per-call differences (property access, type boxing/unboxing, generic dispatch) compound significantly.

  • > Emscripten uses their val class for JS interop which wraps each canvas call in their abstraction layer.

    This is an C++ embind thing right? At least the WebGL2 shim doesn't use that (and IMHO embind should never be used when performance matters), but that might actually explain a lot of the difference.