Comment by jacobgold
10 hours ago
"Quality is like running edge models from 8-12 months ago."
That sounds great for hobbyists but IMHO it wasn't until Opus 4.6 was released six months go (Dec 25, 2025) that we had a model good enough for professionals to use as a primary driver of their coding agents. That seems to be the threshold worth aiming for.
I strongly agree on that being the release where these tools got good enough to substantially speed up my professional work. I have to admit I was super skeptical of AI coding until then.
for me (might be because of the language im using) i had a substantial bump around september and a huge bump around January.
in my stuff now i use an OT library that claude put finishing touches on in September.
You have your dates and models wrong, it was Opus 4.5 released in November 2025, that changed everything, Opus 4.6 was released in February 2026.
You're right. December is when things felt differnt but Opus 4.5 was actually released November 24, 2025.
https://www.anthropic.com/news/claude-opus-4-5
You can already get Opus 4.6 level of performance on subtasks with some local models. So you need to pick a proper code writer, plan writer, code tester etc. model that matches your target expectations and use a coding tool that allows calling different LLMs for different subtasks. For example, people use StepFun 3.x or DeepSeek4-Flash for planning, Qwen3.6-27B for coding.
So thalen it might be 6-8 months to get to useable on a local open model? Of course state of the art will be a year ahead, a generation at the current pace.
I use it for work.
That's cool if you prefer it, but it is hard to imagine it being a strictly rational choice when much better quality is available at a price that is small relative to the cost of an employee. Or is there something specific about your use-case?
Not all work requires every facet to be so sharply optimized, and there may be other constraints that are completely invisible to you. Some that were easy for me to imagine: the parent works in a heavily regulated industry, their IT team is slow-moving and paranoid and this is a safe, under-the-radar workaround, the output is "good enough" for their purposes and they find tinkering with it to be fun.
Regardless I don't think it's fruitful to be so condescending with such little insight into this person's situation. Even if you had total insight -- let people be and withhold your judgement, or at least keep it to yourself. Making people feel stupid is a great way to turn people off to pretty much anything else you have to say
To me, what's not rational is believing you must rent the tools of your trade while exposing all of your employer's intellectual property to a third party. Difference of opinion.
1 reply →
Won’t it depend on what you use it for? A less capable system might be fine for boilerplate, moderate re-factoring, etc. Not everyone is building whole features in one go.
Why don't you people bother to try instead of chasing the latest shiny thing?
You must be the type of crowd that writes websites with React and Tailwind and pretend to be engineers and have an opinion on everything.