Comment by mullen

1 day ago

This is what I did after struggling to get llama.cpp working at a decent speed on my M1 Macbook. The secret is to very specific with your needs and targeted in what you are using llama.cpp for. Mine setup is just about strictly for qwen3-coder and now, I get a fairly decent speed out of it. I also installed Cursor to check Claude and it all worked out well.

Are you talking about Qwen3 Coder 30b a3b Instruct from August 2025, which is a non-reasoning model? Or the more recent "Qwen3 Coder Next" from Feb this year with 80b params, 3b active? I found Qwen3 coder next to be quite good on openrouter [1], but couldn't run it locally.

[1] https://openrouter.ai/qwen/qwen3-coder-next

I don't know why we're even talking about Qwen3.6 for writing code when qwen3-coder exists. My experience is there's no contest. I'm using 30b with 96k context on a dedicated server.

  • For agentic workflows like tool use, editing codebases, multi-turn debugging?