Comment by petermcneeley
3 days ago
In actual implementation they are very much like very wide SIMD on a CPU core. Each HW thread is a different warp as each warp can execute different instructions.
This mapping is so close that translation from GPU to CPU relatively easy and performant.
No comments yet
Contribute on Hacker News ↗