Comment by spookie

9 months ago

You can manipulate texture coordinate derivatives in order to just sample a subset of the whole texture on a pixel shader and only shade those pixels (basically the same as mipmapping, but you can have the "window" wherever you want really).

This is something you can't do on a compute shader, given you don't have access to the built-in derivative methods (building your own won't be cheaper either).

Still, if you want those changes to persist, a compute shader would be the way to go. You _can_ do it using a pixel shader but it really is less clean and more hacky.

4 comments

spookie

fc417fc802 9 months ago

That is true. Hadn't occurred to me because I'd had in mind pixel sorting stuff I did in the past where the fetches and stores aren't contiguous.

Interestingly enough the derivative functions are available to compute shaders as of SM 6.6. [0] Oddly SPIR-V only makes the associated opcodes [1] available to the fragment execution model for some reason. I'm not sure how something like DXVK handles that.

I'm not clear if the associated DXIL or SPIR-V opcodes are actually implemented in hardware. I couldn't immediately find anything relevant in the particular ISA I checked and I'm nowhere near motivated enough to go digging through the Mesa source code to see how the magic happens. Relevant because since you mentioned it I'm curious how much of a perf hit rolling your own is.

[0] https://microsoft.github.io/DirectX-Specs/d3d/HLSL_SM_6_6_De...

[1] https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.htm...

spookie 9 months ago
Huh wasn't aware of that. Nice.
About the performance question, during the frag shader phase neighbouring pixels are already being tracked, so calling those is almost free. It would be difficult to match that performance when already on the compute phase.
- fc417fc802 9 months ago
  
  That's just a matter of what's in cache. If your compute shader operates in coherent blocks it should generally be on par with the equivalent fragment shader. The potential exceptions are where access to dedicated hardware functionality is concerned.
  What I'm curious about is if there's a hardware intrinsic that computes derivatives or if the implementation of those opcodes is generally in software.
  
  1 reply →