Comment by vardalab

6 days ago

Q4 quants on 32G VRAM gives you 131K context for 35BA3B and 27B models who are pretty capable. On 5090 one gets 175 tg and ~7K pp with 35BA3B, 27B isaround 90 tg. So speed is awesome. Even Strix 395 gives 40 tk/s and 256K context. Pretty amazing, there is a reason people are excited about qwen 3.5