Comment by adornKey

1 month ago

I'm running a server in the 5K-league. And the results are very good. I get about 150 Tokens/s from Qwen3 for coding. And about 50 Tokens/s from the newer non-MoE Qwens.

I wouldn't bother with less than 32GB of VRAM. With 16GB you can already run something usable, but 32GB gives you much more power. 9B and 14B are only interesting if you want to tune models yourself. The sweet spot now seem to be around 27B-35B.