← Back to context

Comment by kekePower

6 months ago

I have an RTX 3070 with 8GB VRAM and for me Qwen3:30B-A3B is fast enough. It's not lightning fast, but more than adequate if you have a _little_ patience.

I've found that Qwen3 is generally really good at following instructions and you can also very easily turn on or off the reasoning by adding "/no_think" in the prompt to turn it off.

The reason Qwen3:30B works so well is because it's a MoE. I have tested the 14B model and it's noticeably slower because it's a dense model.

How are you getting Qwen3:30B-A3B running with 8GB? On my system it takes 20GB of VRAM to launch it.

  • It offloads to system memory, but since there are "only" 3 Billion active parameters, it works surprisingly well. I've been able to run models that are up to 29GB in size, albeit very, very slow on my system with 32GB RAM.

  • Probably offload to regular ram I'd wager. Or really, really, reaaaaaaally quantized to absolute fuck. Qwen3:30B-A3B Q1 with a 1k Q4 context uses 5.84GB of vram.