Comment by kekePower

9 months ago

I have an RTX 3070 with 8GB VRAM and for me Qwen3:30B-A3B is fast enough. It's not lightning fast, but more than adequate if you have a _little_ patience.

I've found that Qwen3 is generally really good at following instructions and you can also very easily turn on or off the reasoning by adding "/no_think" in the prompt to turn it off.

The reason Qwen3:30B works so well is because it's a MoE. I have tested the 14B model and it's noticeably slower because it's a dense model.

3 comments

kekePower

tedivm 9 months ago

How are you getting Qwen3:30B-A3B running with 8GB? On my system it takes 20GB of VRAM to launch it.

kekePower 9 months ago

It offloads to system memory, but since there are "only" 3 Billion active parameters, it works surprisingly well. I've been able to run models that are up to 29GB in size, albeit very, very slow on my system with 32GB RAM.
fennecfoxy 9 months ago

Probably offload to regular ram I'd wager. Or really, really, reaaaaaaally quantized to absolute fuck. Qwen3:30B-A3B Q1 with a 1k Q4 context uses 5.84GB of vram.