Comment by ryao

2 months ago

> ROCm isn't part of AMD drivers, its a software library that helps you support legacy compute APIs and stuff in the BLAS/GEMM/LAPACK end of things.

AMD says otherwise:

> AMD ROCm™ is an open software stack including drivers, development tools, and APIs that enable GPU programming from low-level kernel to end-user applications.

https://www.amd.com/en/products/software/rocm.html

The issues involving AMD hardware not only applied to the drivers, but to the firmware below the drivers:

https://www.tomshardware.com/pc-components/gpus/amds-lisa-su...

Tinygrad’s software looks like a userland driver:

https://github.com/tinygrad/tinygrad/blob/master/tinygrad/ru...

It loads various firmware blobs, manages part of the initialization process, manages memory, writes to registers, etcetera. These are all things a driver does.

AMD is extremely bad at communications. The driver already contains everything ROCm requires to talk to the GPU, and ROCm itself is only a SDK that contains runtimes, libraries, and compilers.

This part of TinyGrad is not a driver, however it tries to hijack the process to do part of that task. You cannot boot the system with this, and it does not replace any part of the Mesa/DRI/DRM/KMS/etc stack. It does reinitialize the hardware with a different firmware, which might be why you think this is a driver.

  • I consider it to be a driver, or at least part of one. Userspace drivers exist. Graphic drivers originally were entirely in userspace, until portions of them were moved into the kernel for kernel mode setting and DRM. These days, graphics drivers themselves have both kernel mode and user mode components. The shader compiler for example would be a user mode component.

    • I'm aware. One of the biggest things in fixing the Linux desktop was no longer needing drivers in the Xserver and needing it to be suid root.

      What was linked is written in Python. Nothing in Python is ever going to be a userland driver.

      1 reply →

https://community.amd.com/t5/ai/what-s-new-in-amd-rocm-6-4-b...

ROCm 6.4 software introduces the Instinct GPU Driver, a modular driver architecture that separates the kernel driver from ROCm user space.

  • They were doing this before, the difference with this is, the version of ROCm you use is locked to the driver versions that are supported, which is a very narrow range.

    With this new thing, the backend API is now formalized and easier to support wider range of difference.