Comment by winwang
1 year ago
Personal opinion: it's the software (and software tooling).
The hardware is good enough (even if we're only talking 10x efficiency). Part of the issue seems slightly cultural, i.e. repetitively putting down the idea of traditional task parallelism (not-super-SIMD/data parallelism) on GPUs. Obviously, one would lose a lot of efficiency if we literally ran 1 thread per warp. But it could be useful for lightly-data-parallel tasks (like typical CPU vectorization), or maybe using warp-wide semantics to implement something like a "software" microcode engine. Dumb example: implementing division with long division using multiplications and shifts.
Other things a GPU gives: insanely high memory bandwidth, programmable cache (shared memory), and (relatively) great atomic operations.
I agree.
Many things in software are in the "you're doing it wrong" but that wrong way is subjective and arbitrary.
> maybe using warp-wide semantics to implement something like a "software" microcode engine.
https://github.com/beehive-lab/ProtonVM
Thanks for the share (and reminder)! Turns out I had this bookmarked somehow, lol.
email is on profile, drop me a line if you want to discuss gpu meta machines
2 replies →