← Back to context

Comment by oceanplexian

1 day ago

Honestly your best bet is to buy a $20 Claude subscription, ask Claude to set it all up with Pi and llama.cpp and come back in 20 minutes after a cup of coffee. This is also a good idea because it will help set expectations of what a local model can do vs. a frontier model.

This is what I did after struggling to get llama.cpp working at a decent speed on my M1 Macbook. The secret is to very specific with your needs and targeted in what you are using llama.cpp for. Mine setup is just about strictly for qwen3-coder and now, I get a fairly decent speed out of it. I also installed Cursor to check Claude and it all worked out well.

  • Are you talking about Qwen3 Coder 30b a3b Instruct from August 2025, which is a non-reasoning model? Or the more recent "Qwen3 Coder Next" from Feb this year with 80b params, 3b active? I found Qwen3 coder next to be quite good on openrouter [1], but couldn't run it locally.

    [1] https://openrouter.ai/qwen/qwen3-coder-next

  • I don't know why we're even talking about Qwen3.6 for writing code when qwen3-coder exists. My experience is there's no contest. I'm using 30b with 96k context on a dedicated server.

    • For agentic workflows like tool use, editing codebases, multi-turn debugging?