Comment by svara

1 day ago

There's still a lot of room for the best models to get better at coding .

Your argument rests on the "for marginal gains" part but it's really not clear that the gains are marginal in the foreseeable future.

7 comments

svara

simplyluke 20 hours ago

This is totally valid and I don't agree with the downvotes you're getting. Someone coming out with a 10x improvement is possible and would change the game immediately. The thing is, we really have been seeing marginal gains with shifting leaders in who's got the "best" since GPT3, and at least as a user of these tools that pace has been slowing, not accelerating. Subjectively it feels like we're in the back half of an S-curve.

We're 3.5 years into this current AI wave, and a lot of the valuations have been predicated on what you're arguing here -- that essentially should one of the labs make an order-of-magnitude improvement or hit escape velocity on recursive self-improvement they'd become the most powerful economic chokepoint in history.

The reality has been that given access to compute + capital all of the labs can stay pretty competitive with each other. Someone does a bit better on coding, someone else does a bit better on tool calling, and then they swap after each spending another $100bn.

The market looks like a commodity market where the commodity is intelligence, not a winner-take-all market with massive margins. Plenty of people get rich in oil and airlines, but they notably don't tend to be the innovators long term, they tend to be the operators. Obviously if the machines become sentient tomorrow, turn on their masters, and hit world-dominating intelligence, that assessment changes, but after several years of that narrative while objective reality looks quite different I think the more sober voices are starting to gain a foothold.

svara 10 hours ago
I agree with most of what you're saying, but I think the point I was trying to make wasn't as high-flying as you and others understood it.
I'd pay a premium for even just a model that's 20% better, no ASI required, and I think a lot of people would. I wouldn't call that marginal, if it means I'm getting frustrated on 20% fewer tasks.
A recurring pattern that I've seen in myself and others is to at first be very impressed by a new model's coding capabilities, and then desensitize quickly and start being frustrated by the shortcomings.
- simplyluke 3 hours ago
  
  > I'd pay a premium for even just a model that's 20% better
  The point I'm making is that I think we're rapidly hitting levels where corporate buyers aren't willing to pay multiple-times-more for marginal gains, and I expect that to become more the case over time, not less. You, and a small % of other power users in the market might tolerate a $400/month pro-supreme-plan for access to Mythos or whatever, but I don't think that's going to scale up in quite the same ways we've seen so far.
  Even a year ago paying multiples times more for a 50% gain was very sensible for a lot of workflows. But if we're getting to "good enough" for things like coding, justifying to your CTO/CFO why the org should go from spending $1m/year to $5m/year for a 10% higher hit-rate on one-shot prompts from the engineers is a much tougher sell.

yfw 20 hours ago

What? The gains between gpt4->5 seems to be marginal. No phd level discoveries here

simonw 20 hours ago
The leap from GPT-4 to GPT-5.5 has been astounding in my opinion. There is no way GPT-4 could run a coding agent harness like Codex at even a fraction of the quality that GPT-5.5 does.
- anon373839 19 hours ago
  
  I don’t think that’s exactly indicative of GPT-5.5 being an astoundingly more intelligent model, however. An alternate interpretation is that GPT-5.5 was trained on tool usage/harness patterns and has been optimized for this use case.
  I remember that even when GPT-4 was king, the Gorilla paper showed that Llama 7B could be fine-tuned to outperform GPT-4 on tool calling.
  On domains that don’t involve agentic tool calling*, I haven’t found the frontier to have advanced that much.
  Edit: I should broaden this to domains that naturally lend themselves to RLVR training. Models are drastically better at math now.
  
  1 reply →