Comment by fc417fc802
4 months ago
That's just a matter of what's in cache. If your compute shader operates in coherent blocks it should generally be on par with the equivalent fragment shader. The potential exceptions are where access to dedicated hardware functionality is concerned.
What I'm curious about is if there's a hardware intrinsic that computes derivatives or if the implementation of those opcodes is generally in software.
I chose to focus on the fact the frag stage is already tracking those changes because at that point it's basically free. And you don't need to worry too much.
To answer your question, which is very pertinent, they seem to use different hardware accelerated mechanisms. In the compute stage, wave based derivatives are used, and you need to account for different lane counts between GPU architectures.
Understanding that now makes me believe you're right. But one needs to benchmark them to be sure.