← Back to context

Comment by pierotofy

5 hours ago

Yes. Llama.cpp + Qwen3.6-35b (MTP) + OpenCode is quite capable and runs on a single RTX 3090 and is faster than most cloud models. Quality is like running edge models from 8-12 months ago. Setup details at https://github.com/pierotofy/LocalCodingLLM/

"Quality is like running edge models from 8-12 months ago."

That sounds great for hobbyists but IMHO it wasn't until Opus 4.6 was released six months go (Dec 25, 2025) that we had a model good enough for professionals to use as a primary driver of their coding agents. That seems to be the threshold worth aiming for.

  • I strongly agree on that being the release where these tools got good enough to substantially speed up my professional work. I have to admit I was super skeptical of AI coding until then.

    • for me (might be because of the language im using) i had a substantial bump around september and a huge bump around January.

      in my stuff now i use an OT library that claude put finishing touches on in September.

  • So thalen it might be 6-8 months to get to useable on a local open model? Of course state of the art will be a year ahead, a generation at the current pace.

  • I use it for work.

    • That's cool if you prefer it, but it is hard to imagine it being a strictly rational choice when much better quality is available at a price that is small relative to the cost of an employee. Or is there something specific about your use-case?

      4 replies →

i have a 128gb m4 max macbook pro i've been wanting to tinker with this stuff but genuinely never find the time. any mac users in here running similar to the above that can share their experience?

i always see great debates with local stuff but the space is constantly moving goalposts and all the vernacular is pretty unfamiliar to me. i'd love to understand what people with objective experience feel they've traded away (or gained) when going local so i can determine for myself if these things are a good fit.

Do you do your dev work on the windows machine (referenced in the docs), or do you remotely access it from a separate machine? I ask because I have a RTX 3090 kicking around in a gaming desktop, but I don't use it for any dev work (I use a Macbook Pro).

  • I have a similar set up and have been using it to learn and tinker with open models. I run Ollama on the gaming desktop and point OpenCode to it from my MacBook. Works nicely for me so far.

I use it, it's good, I get work done, but know that they really mean it when they say

> "Quality is like running edge models from 8-12 months ago"

Don't expect Opus, expect more like Haiku. If you micromanage it, you'll get great results. If you want it to be a human in a box, it'll flounder.

Am I doing something wrong or has ollama become shittified?

I'm looking at https://ollama.com/search and the top few models like kimi-k2.7-code say "cloud" and I can't seem to ollama pull them.

I thought the whole POINT of ollama was not-cloud?

how much does the setup cost if i want to buy all the hardware now and increased power costs?