Comment by winwang

1 year ago

Personal opinion: it's the software (and software tooling).

The hardware is good enough (even if we're only talking 10x efficiency). Part of the issue seems slightly cultural, i.e. repetitively putting down the idea of traditional task parallelism (not-super-SIMD/data parallelism) on GPUs. Obviously, one would lose a lot of efficiency if we literally ran 1 thread per warp. But it could be useful for lightly-data-parallel tasks (like typical CPU vectorization), or maybe using warp-wide semantics to implement something like a "software" microcode engine. Dumb example: implementing division with long division using multiplications and shifts.

Other things a GPU gives: insanely high memory bandwidth, programmable cache (shared memory), and (relatively) great atomic operations.

5 comments

winwang

sitkack 1 year ago

I agree.

Many things in software are in the "you're doing it wrong" but that wrong way is subjective and arbitrary.

> maybe using warp-wide semantics to implement something like a "software" microcode engine.

https://github.com/beehive-lab/ProtonVM

winwang 1 year ago
Thanks for the share (and reminder)! Turns out I had this bookmarked somehow, lol.
- sitkack 1 year ago
  
  email is on profile, drop me a line if you want to discuss gpu meta machines
  
  2 replies →