Comment by armchairhacker

1 year ago

> The GPU in your computer is about 10 to 100 times more powerful than the CPU, depending on workload. For real-time graphics rendering and machine learning, you are enjoying that power, and doing those workloads on a CPU is not viable. Why aren’t we exploiting that power for other workloads? What prevents a GPU from being a more general purpose computer?

What other workloads would benefit from a GPU?

Computers are so fast that in practice, many tasks don't need more performance. If a program that runs those tasks is slow, it's because that program's code is particularly bad, and the solution to make the code less bad is simpler than re-writing it for the GPU.

For example, GUIs have been imperceptibly reactive to user input for over 20 years. If an app's GUI feels sluggish, the problem is that the app's actions and rendering aren't on separate coroutines, or the action's coroutine is blocking (maybe it needs to be on a separate thread). But the rendering part of the GUI doesn't need to be on a GPU (any more than it is today, I admit I don't know much about rendering), because responsive GUIs exist today, some even written in scripting languages.

In some cases, parallelizing a task intrinsically makes it slower, because the number of sequential operations required to handle coordination mean there are more forced-sequential operations in total. In other cases, a program spawns 1000+ threads but they only run on 8-16 processors, so the program would be faster if it spawned less threads because it would still use all processors.

I do think GPU programming should be made much simpler, so this work is probably useful, but mainly to ease the implementation of tasks that already use the GPU: real-time graphics and machine learning.

Possibly compilation and linking. That's very slow for big programs like Chromium. There's really interesting work on GPU compilers (co-dfns and Voetter's work).

Optimization problems like scheduling and circuit routing. Search in theorem proving (the classical parts like model checking, not just LLM).

There's still a lot that is slow and should be faster, or at the very least made to run using less power. GPUs are good at that for graphics, and I'd like to see those techniques applied more broadly.

  • All of these things you mention are "thinking", meaning they require complex algorithms with a bunch of branches and edge cases.

    The tasks that GPUs are good at right now - graphics, number crunching, etc - are all very simple algorithms at the core (mostly elementary linear algebra), and the problems are, in most cases, embarassingly parallel.

    CPUs are not very good at branching either - see all the effort being put towards getting branch prediction right - but they are way better at it than GPUs. The main appeal of GPGPU programming is, in my opinion, that if you can get the CPU to efficiently divide the larger problem into a lot of small, simple subtasks, you can achieve faster speeds.

    You mentioned compilers. See a related example, for reference all the work Daniel Lemire has been doing on SIMD parsing: the algorithms he (co)invented are all highly specialized to the language, and highly nontrivial. Branchless programming requires an entirely different mindset/intuition than "traditional" programming, and I wouldn't expect the average programmer to come up with such novel ideas.

    A GPU is a specialized tool that is useful for a particular purpose, not a silver bullet to magically speed up your code. Theree is a reason that we are using it for its current purposes.

  • > Possibly compilation and linking. That's very slow for big programs like Chromium.

    So instead of fixing the problem (Chromium's bloat) we just trow more memory and computing power at it, hopping that the problem will go away.

    Maybe we shall teach programmers to programm. /s

A big one is video encoding. It seems like GPUs would be ideal for it but in practice limitations in either the hardware or programming model make it hard to efficiently run on GPU shader cores. (GPUs usually include separate fixed-function video engines but these aren't programmable to support future codecs.)

  • Video encoding is done with fixed-function for power efficiency. A new popular codec like H26x codec appears every 5-10 years, there is no real need to support future ones.

    • Video encoding is two domains. And there's surprisingly little overlap between them.

      You have your real time video encoding. This is video conferencing, live television broadcasts. This is done fixed-function not just for power efficiency, but also latency.

      The second domain is encoding at rest. This is youtube, netflix, blu-ray, etc. This is usually done in software on the CPU for compression ratio efficiency.

      The problem with fixed function video encoding is that the compression ratio is bad. You either have enormous data, or awful video quality, or both. The problem with software video encoding is that it's really slow. OP is asking why we can't/don't have the best of both worlds. Why can't/don't we write a video encoder in OpenCL/CUDA/ROCm. So that we have the speed of using the GPU's compute capability but compression ratio of software.