Comment by regexorcist
12 hours ago
You don't mention runtime, hardware and harness which are critical. The 35B A3B model should be pretty fast, you do need a decent setup but nothing too fancy. I'm using Q8_XL from unslouth with llama.cpp and opencode and it's pretty awesome. I find that opencode drives the model best, it very rarely gets stuck even with a ton of tool calls. I agree it's comparable to Sonnet 4.5 for most tasks. You may also try the Gamma 4 models which are faster but not as good for coding.
No comments yet
Contribute on Hacker News ↗