← Back to context

Comment by saagarjha

17 days ago

Pretty sure CUDA will limit your thread count to hardware constraints? You can’t just request a million threads.

You can request up to 1024-2048 threads per block depending on the gpu; each SM can execute between 32 and 128 threads at a time! So you can have a lot more threads assigned to an SM than the SM can run at once