Comment by bigyabai

6 months ago

Those Qwen3 2507 models are the local creme-de-la-creme right now. If you've got any sort of GPU and ~32gb of RAM to play with, the A3B one is great for pair-programming tasks.

23 comments

bigyabai

pdimitar 6 months ago

Do you happen to know if it can be run via an eGPU enclosure with f.ex. RTX 5090 inside, under Linux?

I'm considering buying a Linux workstation lately and I want it full AMD. But if I can just plug an NVIDIA card via an eGPU card for self-hosting LLMs then that would be amazing.

oktoberpaard 6 months ago
I’m running Ollama on 2 eGPUs over Thunderbolt. Works well for me. You’re still dealing with an NVDIA device, of course. The connection type is not going to change that hassle.
- pdimitar 6 months ago
  
  Thank you for the validation. As much as I don't like NVIDIA's shenanigans on Linux, having a local LLM is very tempting and I might put my ideological problems to rest over it.
  Though I have to ask: why two eGPUs? Is the LLM software smart enough to be able to use any combination of GPUs you point it at?
  
  2 replies →
bigyabai 6 months ago

Sure, though you'll be bottlenecked by the interconnect speed if you're tiling between system memory and the dGPU memory. That shouldn't be an issue for the 30B model, but would definitely be an issue for the 480B-sized models.
gunalx 6 months ago
You would still need drivers and all the stuff difficult with nvidia in linux with a egpu. (Its not nessecarily terrible just suboptimal) Rather just add the second GPU in the Workstation, or just run the llm in your AMD GPU.
- pdimitar 6 months ago
  
  Oh, we can run LLMs efficiently with AMD GPUs now? Pretty cool, I haven't been following, thank you.
  
  4 replies →

indigodaddy 6 months ago

Do we get these good qwen models when using qwen-code CLI tool and authing via qwen.ai account?

bigyabai 6 months ago

I'm not sure, probably?
esafak 6 months ago
You do not need qwen-code or qwen.ai to use them; openrouter + opencode suffice.
- indigodaddy 6 months ago
  
  Right, I'm aware, was just wondering about that specific scenario.
  
  1 reply →

decide1000 6 months ago

I use it on a 24gb gpu Tesla P40. Very happy with the result.

hkt 6 months ago
Out of interest, roughly how many tokens per second do you get on that?
- edude03 6 months ago
  
  Like 4. Definitely single digit. The P40s are slow af
  
  2 replies →

tomr75 6 months ago

With qwen code?