Comment by pierotofy

10 hours ago

Yes. Llama.cpp + Qwen3.6-35b (MTP) + OpenCode is quite capable and runs on a single RTX 3090 and is faster than most cloud models. Quality is like running edge models from 8-12 months ago. Setup details at https://github.com/pierotofy/LocalCodingLLM/

33 comments

pierotofy

jacobgold 9 hours ago

"Quality is like running edge models from 8-12 months ago."

That sounds great for hobbyists but IMHO it wasn't until Opus 4.6 was released six months go (Dec 25, 2025) that we had a model good enough for professionals to use as a primary driver of their coding agents. That seems to be the threshold worth aiming for.

sbrother 9 hours ago
I strongly agree on that being the release where these tools got good enough to substantially speed up my professional work. I have to admit I was super skeptical of AI coding until then.
- dnautics 9 hours ago
  
  for me (might be because of the language im using) i had a substantial bump around september and a huge bump around January.
  in my stuff now i use an OT library that claude put finishing touches on in September.
storus 3 hours ago

You can already get Opus 4.6 level of performance on subtasks with some local models. So you need to pick a proper code writer, plan writer, code tester etc. model that matches your target expectations and use a coding tool that allows calling different LLMs for different subtasks. For example, people use StepFun 3.x or DeepSeek4-Flash for planning, Qwen3.6-27B for coding.
Projectiboga 9 hours ago

So thalen it might be 6-8 months to get to useable on a local open model? Of course state of the art will be a year ahead, a generation at the current pace.
pierotofy 9 hours ago
I use it for work.
- jacobgold 9 hours ago
  
  That's cool if you prefer it, but it is hard to imagine it being a strictly rational choice when much better quality is available at a price that is small relative to the cost of an employee. Or is there something specific about your use-case?
  
  4 replies →
epolanski 4 hours ago

Why don't you people bother to try instead of chasing the latest shiny thing?
You must be the type of crowd that writes websites with React and Tailwind and pretend to be engineers and have an opinion on everything.

trueno 9 hours ago

i have a 128gb m4 max macbook pro i've been wanting to tinker with this stuff but genuinely never find the time. any mac users in here running similar to the above that can share their experience?

i always see great debates with local stuff but the space is constantly moving goalposts and all the vernacular is pretty unfamiliar to me. i'd love to understand what people with objective experience feel they've traded away (or gained) when going local so i can determine for myself if these things are a good fit.

brycesub 9 hours ago
If you have a 128GB Mac you really ought to try out: https://github.com/antirez/ds4 by the creator of redis. This is probably as close to it gets to state-of-the-art local LLM + agentic coding.
- __mharrison__ 6 hours ago
  
  Using this just this morning on my DGX Spark. A little slower than frontier models but my $200/mo weekly usage exhausted with 3 days left on the week...
  (Shouldn't have done that refactoring job in high mode)
- trueno 7 hours ago
  
  well this is supremely interesting thanks for putting it on my radar
- lostlogin 8 hours ago
  
  Thank you.
htrp 9 hours ago

Use your ClaudeCode sub and tell it to set it up for you
dirkolbrich 6 hours ago

I have the same machine. You might look into https://omlx.ai/ a „macOS-native MLX server“. pi.dev for the agent with MCP, web-search and sub-agents extension.

atomicnumber3 10 hours ago

Same. I have no desire to use Claude at all anymore.

pierotofy 9 hours ago
Yep. Screw Anthropic, CloseAI and all other rent seekers in this space.
- akulbe 8 hours ago
  
  I have an M2 Max MBP with 96GB of RAM. What models and setup would you use for this kind of configuration?
  
  1 reply →

daveidol 9 hours ago

Do you do your dev work on the windows machine (referenced in the docs), or do you remotely access it from a separate machine? I ask because I have a RTX 3090 kicking around in a gaming desktop, but I don't use it for any dev work (I use a Macbook Pro).

snake_n_my_boot 7 hours ago

I have a similar set up and have been using it to learn and tinker with open models. I run Ollama on the gaming desktop and point OpenCode to it from my MacBook. Works nicely for me so far.

lelandbatey 9 hours ago

I use it, it's good, I get work done, but know that they really mean it when they say

> "Quality is like running edge models from 8-12 months ago"

Don't expect Opus, expect more like Haiku. If you micromanage it, you'll get great results. If you want it to be a human in a box, it'll flounder.

dheera 9 hours ago

Am I doing something wrong or has ollama become shittified?

I'm looking at https://ollama.com/search and the top few models like kimi-k2.7-code say "cloud" and I can't seem to ollama pull them.

I thought the whole POINT of ollama was not-cloud?

satvikpendem 9 hours ago

Ollama is not recommended to be used. Use llama.cpp.
hoherd 9 hours ago

I experienced the same situation a month or two ago. One of my friends sent me this article that was illuminating. https://sleepingrobots.com/dreams/stop-using-ollama/
jubilanti 7 hours ago

> I thought the whole POINT of ollama was not-cloud?
It was at first, then the developers realized they had a massive userbase they could monetize. A tale as old as open source...
jmorgan 9 hours ago

The larger models are available on Ollama's cloud as most folks don't have the hardware to run 500B-1T parameter models.
toyg 9 hours ago

Yes, you've nailed it. Ollama are desperately trying to pull a Cursor - like 3791 other projects in this space.

dominotw 9 hours ago

how much does the setup cost if i want to buy all the hardware now and increased power costs?