Comment by causal
6 days ago
You COULD even do Qwen3.5-35B-A3B-GGUF.
UD-IQ3-XXS is only 13.1GB, which might outperform both in both intelligence and certainly speed (only 3B activated): https://huggingface.co/unsloth/Qwen3.5-35B-A3B-GGUF
To accommodate cache you will need to offload a few feed-forward layers to the CPU. Will still be quite fast.
Edit: Actually 27B does a little better than 35B on most benchmarks- 35B will still be much faster.
No comments yet
Contribute on Hacker News ↗