Comment by jms55
1 year ago
Agreed, there are two different problems being described here.
1. Divergence of threads within a workgroup/SM/whatever
2. Dynamically scheduling new workloads (i.e. dispatches, draws, etc) in response to the output of a previous workload
Raytracing is problem #1 (and has it's own solutions, like shader execution reodering), while Raph is talking about problem #2.
> Raytracing is problem #1 (and has it's own solutions, like shader execution reodering)
The "solution" to Raytracing (ignoring hardware acceleration like shader reordering), is stream compaction and stream expansion.
If you are willing to have lots of loops inside of a shader (not always possible due to Windows's 2 second maximum), you can while(hits_array is not empty) kind of code, allowing your 1024-wavegroup to keep recursively calling all of the hits and efficiently processing all of the rays recursively.
--------
The important tidbit is that this technique generalizes. If you have 5 functions that need to be "called" after your current processing, then it becomes:
Now of course we can't grow "too far", GPUs can't handle divergence very well. But for "small" numbers of next-arrays and "small" amounts of divergence (ie: I'm assuming that func1 is the most common here, like 80%+ so that the buffers remain full), then this technique works.
If you have more divergence than that, then you need to think more carefully about how to continue. Maybe GPUs are a bad fit (ex: any HTTP server code will be awful on GPUs) and you're forced to use a CPU.