Comment by textlapse
9 days ago
What's the performance like? What would the benefits be of converting a streaming multiprocessor programming model to this?
9 days ago
What's the performance like? What would the benefits be of converting a streaming multiprocessor programming model to this?
We aren't focused on performance yet (it is often workload and executor dependent, and as the post says we currently do some inefficient polling) but Rust futures compile down to state machines so they are a zero-cost abstraction.
The anticipated benefits are similar to the benefits of async/await on CPU: better ergonomics for the developer writing concurrent code, better utilization of shared/limited resources, fewer concurrency bugs.
warp is expensive - essentially it's running a 'don't run code' to maintain SIMT.
GPUs are still not practically-Turing-complete in the sense that there are strict restrictions on loops/goto/IO/waiting (there are a bunch of band-aids to make it pretend it's not a functional programming model).
So I am not sure retrofitting a Ferrari to cosplay an Amazon delivery van is useful other than for tech showcase?
Good tech showcase though :)
I think you're conflating GPU 'threads' and 'warps'. GPU 'threads' are SIMD lanes that are all running with the exact same instructions and control flow (only with different filtering/predication), whereas GPU warps are hardware-level threads that run on a single compute unit. There's no issue with adding extra "don't run code" when using warps, unlike GPU threads.
3 replies →