Comment by dragontamer
1 year ago
> Raytracing is problem #1 (and has it's own solutions, like shader execution reodering)
The "solution" to Raytracing (ignoring hardware acceleration like shader reordering), is stream compaction and stream expansion.
if (ray hit){
push(hits_array, currentRay);
} else {
push (miss_array, currentRay);
}
If you are willing to have lots of loops inside of a shader (not always possible due to Windows's 2 second maximum), you can while(hits_array is not empty) kind of code, allowing your 1024-wavegroup to keep recursively calling all of the hits and efficiently processing all of the rays recursively.
--------
The important tidbit is that this technique generalizes. If you have 5 functions that need to be "called" after your current processing, then it becomes:
if (func1 needs to be called next){
push(func1, dataToContinue);
} else if (func2 needs to be called next){
push(func2, dataToContinue);
} else if (func3 needs to be called next){
push(func3, dataToContinue);
} else if (func4 needs to be called next){
push(func4, dataToContinue);
} else if (func5 needs to be called next){
push(func5, dataToContinue);
}
Now of course we can't grow "too far", GPUs can't handle divergence very well. But for "small" numbers of next-arrays and "small" amounts of divergence (ie: I'm assuming that func1 is the most common here, like 80%+ so that the buffers remain full), then this technique works.
If you have more divergence than that, then you need to think more carefully about how to continue. Maybe GPUs are a bad fit (ex: any HTTP server code will be awful on GPUs) and you're forced to use a CPU.
No comments yet
Contribute on Hacker News ↗