Comment by gunalx

3 days ago

You would still need drivers and all the stuff difficult with nvidia in linux with a egpu. (Its not nessecarily terrible just suboptimal) Rather just add the second GPU in the Workstation, or just run the llm in your AMD GPU.

5 comments

gunalx

pdimitar 3 days ago

Oh, we can run LLMs efficiently with AMD GPUs now? Pretty cool, I haven't been following, thank you.

DarkFuture 3 days ago

I've been running LLM models on my Radeon 7600 XT 16GB for past 2-3 months without issues (Windows 11). I've been using llama.cpp only. The only thing from AMD I installed (apart from latest Radeon drivers) is the "AMD HIP SDK" (very straight forward installer). After unzipping (the zip from GitHub releases page must contain hip-radeon in the name) all I do is this:
llama-server.exe -ngl 99 -m Qwen3-14B-Q6_K.gguf
And then connect to llamacpp via browser to localhost:8080 for the WebUI (its basic but does the job, screenshots can be found on Google). You can connect more advanced interfaces to it because llama.cpp actually has OpenAI-compatible API.
Plasmoid2000ad 2 days ago

Yes - I'm running a LM Studio on windows on a 6800xt, and everything works more-or-less out of the box using always using Vulkan llama.cpp on the gpu I believe.
There's also ROCm. That's not working for me in LM Studio at the moment. I used that early last year to get some LLMs and stable diffusion running. As far as I can tell, it was faster before, but Vulkan implementations have caught up or something - so much the mucking about isn't often worth it. I believe ROCm is hit or miss for a lot of people, especially on windows.
bavell 2 days ago

IDK about "efficiently" but we've been able to run llms locally with AMD for 1.5-2 years now
green7ea 2 days ago

llama.cpp and lmstudio have a Vulkan backend which is pretty fast. I'm using it to run models on a Strix Halo laptop and it works pretty well.