← Back to context

Comment by badsectoracula

10 months ago

If you don't mind going through the eldritchian horror that is building ROCm from source[0], Qwen_Qwen3-30B-A3B-Q6_K (6bit quantization of the LLM mentioned in the article which in practice shouldn't be much different) works decently fast on a RX 7900 XTX using koboldcpp and llama.cpp. And by "decently fast" i mean "it writes faster i can read".

If you're on Debian AFAIK AMD is paying someone to experience the pain in your place, so that is an option if you're building something from scratch, but my openSUSE Tumbleweed installation predates the existence of llama.cpp by a few years and i'm not subjecting myself to the horror that is Python projects (mis)managed by AI developers[1] :-P.

EDIT: my mistake, ROCm isn't needed (or actually, supported) by koboldcpp, it uses Vulkan. ROCm is available via a fork. Still, with Vulkan it is fast too.

[0] ...and more than once as after some OS upgrade it might break, like mine

[1] ok, i did it once, because recently i wanted to try out some tool someone wrote that relied on some AI stuff and i was too stubborn to give up - i had to install Python from source on a Debian docker container because some dependency 2-3 layers deep didn't compile with a newer minor version release of Python. It convinced me to thank yet again to thank Georgi Gerganov for making AI-related tooling that enables people to stick with C++

If you are on Debian, ROCm is already packaged in Debian 13 (Trixie).

llama.cpp can be built using Debian-supplied libraries with ROCm backend enabled.

  • Yeah, as i wrote "if you're on Debian AFAIK AMD is paying someone to experience the pain in your place" :-).

    I used to use Debian at the past but when i was about to install my current OS i already had the openSUSE Tumbleweed installer in a USB so i went with that. Ultimately i just needed "a Linux" and didn't care which. I do end up building more stuff from source than when i used Debian but TBH the only time that annoyed me was with ROCm because it is broken into 2983847283 pieces, many of them have their own flags for the same stuff, some claim they allow to install them anywhere but in practice can only work via the default in "/opt", and a bunch of them have their own special snowflake build process (including one that downloads some random stuff via a script through the build process - IIRC a Gentoo packager made a bug report about it to remove the need to download stuff, but i'm not sure if it has been addressed or not).

    If i was doing a fresh OS install i'd probably go with Gentoo - it packages ROCm like Debian, but AFAICT (i haven't tried it) it also provides some tools for you to make bespoke patches to packages you install that survive updates and i'd like to do some customizations on stuff i install.

Yesterday I was successively using olama installed qwen3:32b and drove it using Simon Willison's llm tool (https://llm.datasette.io/en/stable/). Using CPU only, it ran (if you can call moving at the speed of a walker running) and sucked up almost all of my 32 GB ram.

My laptop has dual (and dueling) graphics chips, Intel and Quadro K1200M with 4 GB of RAM. I will need to learn more about LLM setup, so maybe I can torture myself getting the Nvidia driver working on Linux and experiment with that.