← Back to context

Comment by PerryStyle

2 days ago

Do you have any good resources that go into detail on GPU ISAs or GPU architecture? There's certainly a lot available for CPUs, but the resources I’ve found for GPUs mostly focus on how they differ from CPUs and how their ISAs are tailored to the GPU's specific goals.

Unfortunately this is a topic that isn't open enough, and architectures change rather quickly so you're always chasing the rabbit. That being said:

RDNA architecture (a few gens old) slides has some breadcrumbs: https://gpuopen.com/download/RDNA_Architecture_public.pdf

AMD also publishes its ISAs, but I don't think you'll be able to extract much from a reference-style document: https://gpuopen.com/amd-gpu-architecture-programming-documen...

Books on CUDA/HIP also go into some detail of the underlying architecture. Some slides from NV:

https://gfxcourses.stanford.edu/cs149/fall21content/media/gp...

Edit: I should say that Apple also publishes decent stuff. See the link here and the stuff linked at the bottom of the page. But note that now you're in UMA/TBDR territory; discrete GPUs work considerably differently: https://developer.apple.com/videos/play/wwdc2020/10602/

If anyone has more suggestions, please share.

I assume most people learn microarchitecture for performance reasons.

At which point, the question you are really asking is what aspects of assembly are important for performance.

Answer: there are multiple GPU Matrix Multiplication examples covering channels (especially channel conflicts), load/store alignment, memory movement and more. That should cover the issue I talked about earlier.

Optimization guides help. I know it's 10+ years old, but I think AMDs OpenCL optimization guides was easy to read and follow, and still modern enough to cover most of today's architectures.

Beyond that, you'll have to see conferences about DirectX12 new instructions (wave instructions, ballot/voting, etc. etc) and their performance implications.

It's a mixed bag, everyone knows one or two ways of optimization but learning all of them requires lots of study.