← Back to context

Comment by gunalx

3 days ago

You would still need drivers and all the stuff difficult with nvidia in linux with a egpu. (Its not nessecarily terrible just suboptimal) Rather just add the second GPU in the Workstation, or just run the llm in your AMD GPU.

Oh, we can run LLMs efficiently with AMD GPUs now? Pretty cool, I haven't been following, thank you.

  • I've been running LLM models on my Radeon 7600 XT 16GB for past 2-3 months without issues (Windows 11). I've been using llama.cpp only. The only thing from AMD I installed (apart from latest Radeon drivers) is the "AMD HIP SDK" (very straight forward installer). After unzipping (the zip from GitHub releases page must contain hip-radeon in the name) all I do is this:

    llama-server.exe -ngl 99 -m Qwen3-14B-Q6_K.gguf

    And then connect to llamacpp via browser to localhost:8080 for the WebUI (its basic but does the job, screenshots can be found on Google). You can connect more advanced interfaces to it because llama.cpp actually has OpenAI-compatible API.

  • Yes - I'm running a LM Studio on windows on a 6800xt, and everything works more-or-less out of the box using always using Vulkan llama.cpp on the gpu I believe.

    There's also ROCm. That's not working for me in LM Studio at the moment. I used that early last year to get some LLMs and stable diffusion running. As far as I can tell, it was faster before, but Vulkan implementations have caught up or something - so much the mucking about isn't often worth it. I believe ROCm is hit or miss for a lot of people, especially on windows.

  • IDK about "efficiently" but we've been able to run llms locally with AMD for 1.5-2 years now

  • llama.cpp and lmstudio have a Vulkan backend which is pretty fast. I'm using it to run models on a Strix Halo laptop and it works pretty well.