Comment by sxzygz
13 days ago
Uuugh, graphics. So many smart people expending great energy to look busy while doing nothing particularly profound.
Graphics people, here is what you need to do.
1) Figure out a machine abstraction.
2) Figure out an abstraction for how these machines communicate with each other and the cpu on a shared memory bus.
3) Write a binary spec for code for this abstract machine.
4) Compilers target this abstract machine.
5) Programs submit code to driver for AoT compilation, and cache results.
6) Driver has some linker and dynamic module loading/unloading capability.
7) Signal the driver to start that code.
AMD64, ARM, and RISC-V are all basically differing binary specs for a C-machine+MMU+MMIO compute abstraction.
Figure out your machine abstraction and let us normies write code that’s accelerated without having to throw the baby out with the bathwater ever few years.
Oh yes, give us timing information so we can adapt workload as necessary to achieve soft real-time scheduling on hardware with differing performance.
They have done it. The current modern abstraction is called Vulkan, and the binary spec code for this machine is called SPIR-V.
Wow, you should get NVIDIA, AMD and Intel on the phone ASAP! Really strange that they didn't come up with such a simple and straightforward idea in the last 3 decades ;)
I don’t know which of my detractors to respond to, so I’ll respond here.
It should be clear that I’m only interested in compute and not a GPU expert.
GPUs, from my understanding, have lost the majority of fixed-function units as they’ve become more programmable. Furthermore, GPUs clearly have a hidden scheduler and this is not fully exposed by vendors. In other words we have no control over what is being run on a GPU at any given instant, we simply queue work for it.
Given all these contrivances, why should not the interface exposed to the user be absolutely simple. It should then be up to vendors to produce hardware (and co-designed compilers) to run our software as fast as possible.
Graphics developers need to develop a narrow-waist abstraction for wide, latency-hiding, SIMD compute. On top of this Vulkan, or OpenGL, or ML inference, or whatever can be done. The memory space should also be fully unified.
This is what needs to be worked on. If you don’t agree, that’s fine, but don’t pretend that you’re not protecting entrenched interests from the likes of Microsoft, Nvidia, Epic Games, Valve and others.
Telling people to just use Unreal engine, or Unity, or even Godot, it just like telling people to just use Python, or Typescript, or Go to get their sequential compute done.
Expose the compute!
> GPUs, from my understanding, have lost the majority of fixed-function units as they’ve become more programmable.
That would be nice but doesn't match reality unfortunately, there are even new fixed-fuction units added from time to time (e.g. for raytracing).
Texture sampling units also seem to be critical for performance and probably won't go away for a while.
It should be possible to hide a lot of the fixed-function magic behind high level GPU instructions (e.g. for sampling a texture), but GPU vendors still don't agree about details like how the texture and sampler properties are managed on the GPU (see: https://www.gfxstrand.net/faith/blog/2022/08/descriptors-are...).
E.g. the problem isn't in the software, but the differing hardware designs, and GPU vendors don't seem to like the idea of harmonizing their GPU architectures and they're also not a fan of creating a common ISA as compatibility shim (e.g. how it is common for CPUs). Instead the 3D API, driver and highlevel shader bytecode (e.g. SPIRV) is this common interface, and that's how we landed at the current situation with all its downsides (most of the reasons are probably not even technical, but legal/strategic - patents and stuff).
Thanks for the link to the post. I also watched her talk posted elsewhere in these comments. We’re lucky to have people like her doing the hard work for free software.
> most of the reasons are probably not even technical, but legal/strategic - patents and stuff
I think fighting for specified interoperable interfaces is important and we must be vigilant again forces that undermine this, either knowingly or through ignorance.
some of this is what's khronos standards are theoretically supposed to achieve.
surprise, it's very difficult to do across many hw vendors and classes of devices. it's not a coincidence that metal is much easier to program for.
maybe consider joining khronos since you apparently know exactly how to achieve this very simple goal...
> it's not a coincidence that metal is much easier to program for
Tbf, Metal also works on non-Apple GPUs and with only minimal additional hints to manage resources in non-unified memory.
It sounds like webgl + wasm.