Comment by sitkack
1 year ago
This essay needs more work.
Are you arguing for a better software abstraction, a different hardware abstraction or both? Lots of esoteric machines are name dropped, but it isn't clear how that helps your argument.
Why not link to Vello? https://github.com/linebender/vello
I think a stronger essay would at the end give the reader a clear view of what Good means and how to decide if a machine is closer to Good than another machine and why.
SIMD machines can be turned into MIMD machines. Even hardware problems still need a software solution. The hardware is there to offer the right affordances for the kinds of software you want to write.
Lots of words that are in the eye of beholder. We need a checklist or that Good parallel computer won't be built.
Personal opinion: it's the software (and software tooling).
The hardware is good enough (even if we're only talking 10x efficiency). Part of the issue seems slightly cultural, i.e. repetitively putting down the idea of traditional task parallelism (not-super-SIMD/data parallelism) on GPUs. Obviously, one would lose a lot of efficiency if we literally ran 1 thread per warp. But it could be useful for lightly-data-parallel tasks (like typical CPU vectorization), or maybe using warp-wide semantics to implement something like a "software" microcode engine. Dumb example: implementing division with long division using multiplications and shifts.
Other things a GPU gives: insanely high memory bandwidth, programmable cache (shared memory), and (relatively) great atomic operations.
I agree.
Many things in software are in the "you're doing it wrong" but that wrong way is subjective and arbitrary.
> maybe using warp-wide semantics to implement something like a "software" microcode engine.
https://github.com/beehive-lab/ProtonVM
Thanks for the share (and reminder)! Turns out I had this bookmarked somehow, lol.
3 replies →
> Are you arguing for a better software abstraction, a different hardware abstraction or both?
I don't speak for Raph, but imo it seems like he was arguing for both, and I agree with him.
On the hardware side, GPUs have struggled with dynamic workloads at the API level (not e.g. thread-level dynamism, that's a separate topic) for around a decade. Indirect commands gave you some of that so at least the size of your data/workload can be variable if not the workloads themselves, then mesh shaders gave you a little more access to geometry processing, and finally workgraphs and device generated commands lets you have an actually dynamically defined workload (e.g. completely skipping dispatches for shading materials that weren't used on screen this frame). However it's still very early days, and the performance issues and lack of easy portability are problematic. See https://interplayoflight.wordpress.com/2024/09/09/an-introdu... for instance.
On the software side shading languages have been garbage for far longer than hardware has been a problem. It's only in the last year or two that a proper language server for writing shaders has even existed (Slang's LSP). Much less the innumerable driver compiler bugs, lack of well defined semantics and memory model until the last few years, or the fact that we're still manually dividing work into the correct cache-aware chunks.
Absolutely. And the fact that we need to evolve both is one of the reasons progress has been difficult.