Comment by jobs_throwaway
1 day ago
It depends what you mean by locally. I don't foresee running a model on my laptop anytime soon to power a coding agent. Far more likely is an infra team at my company operating an open source model on cloud infrastructure. When they're already paying $1000 / month / dev, it starts to pencil pretty quickly.
Is there any open model as good as opus 4.6 at any price?
Kimi 2.6 probably. Needs over 300GB of GPU memory to run (1TB for for full capabilities) so either a 4x A100 or 8x A6000 would do it.
A $50k - 100k rig could do it and an entire company would be able to use it a full speed.
How many problems require Opus-4.6-level performance? The "I'll accept nothing but the very best model for any task" thinking is perplexing to me.
People got a lot done before Opus 4.6. In 6 months, would you be dissatisfied by Opus-4.6-level open-weight models, just because Opus 4.8 will be out?
Not OP but I've been thinking about this a lot (like everyone ha) and I think my answer is, yes?
I hope there's a "good enough" point but I don't think we're there yet. Like for me hardware got good enough several years ago. But while opus 4.7 is really good compared to everything else, it's not so good that I would use it at a discount over whatever is available in a few months. The improvement in quality, speed, and daily frustration is worth it to me... Spoken as someone whose employer is footing the bill, so take that with a grain of salt.
I want to run my own local models, but I don't think that's feasible without lots of frustration until a few generations of frontier models are so good that they're almost indistinguishable for common tasks. Kind of like how MacBook pros have been for a while.
3 replies →
I'm very happy to have multiple sessions open (and do) and switch between fast and slow models, and if there were a batch mode in codex or Claude code I would use it. (Just like I sometimes use codex fast mode)
But at the moment, I can't imagine why I wouldn't be spending the majority of my time with the best models. I'm spending a lot of time with them! Reducing the number of back-and-forths is extremely valuable to me.
I expect in two months I will still want to spend >80% of my time prompting the best models, and that's true if I were spending my own money on hobby projects, too.
1 reply →
Well, I see what you mean, but two big concepts...
1A. Models get stale pretty quickly w.r.t. new developments that occur past their cutoff date. "But you can just keep them current by linking them to never documentation, etc!" Well, no, you sorta can't -- at least not in perpetuity. Those search results fill up your context window real quick. So that gets unsustainable real quick.
1B. Even when your context has plenty of free space, the results you get from "here's a link to the documentation for this new framework that released after your cutoff date" absolutely pales to the results you get from knowledge that is fully baked into the trained model as opposed to your context window. For one thing, that documentation link you pasted into your context might link to... a dozen code examples. Whereas if that was baked into the model itself, the model might have been trained on many thousands of examples in Github etc.
2. It's also a reality that most professional engineers have to keep up with their peers and competitors. We can maybe say it shouldn't be that way, but it is. So if $SOME_NEW_MODEL is significantly better than 4.6... and my peers and or competitors are using it, then yeah I might but really feeling the need to match them. And I'm not even necessarily talking about some kind of cutthroat dog-eat-dog stack-ranked workplace.
These limitations aren't relevant for all use cases or careers but they're hiiiiiiiighly relevant for professional software engineering.
2 replies →
No, but the big open models are on the level of Sonnet 4.6, which is very good for most problems.
The people who are claiming Opus level capability does not have sufficiently complex problems to see the difference.
And neither side brings any evidence ...
For coding don't think so, but they are very close. I code with sonnet mostly because I think opus is just useful if you fail to dissect problems adequately, but anyway.
Kimi is close for example regarding SWE bench for code. For reasoning there are open models that surpass opus by quite a margin already.