Comment by Calavar

12 hours ago

I believe they mean GPU threads. Plenty of cuda files in their repository.

4 comments

Calavar

Fair enough, but that's then only absolutely max 1024 threads per SM, which wouldn't get anywhere near 1 million, given 5090 only has 192 SMs...

Future proofing I guess...

cyber_kinetist 12 hours ago

You can launch much more logical threads than the available physical threads. The GPU scheduler will automatically dispatch the work to the SMs.
ks6g10 8 hours ago

Just like 2 threads can execute on the same core at the "same" time, i.e. no synchronization, the same is true for GPU threads/ thread groups.
zipy124 11 hours ago

I guess they never say that they execute at the same time technically haha