Comment by sinenomine

4 days ago

People underestimate the lead OAI has with their post-5.2 models. The author does not strike me as someone who closely follows the progress frontier labs make in US and around the world.

26 comments

sinenomine

energy123 4 days ago

It's a joint ignorance of how these frontier models get baked and what consumers want.

Many pundits think it's just a matter of scraping the internet and having a few ML scientists run ablation experiments to tune hyperparameters. That hasn't been true for over a year. The current requirements are more org-scale, more payoff from scale, more moat. The main legitimate competitive threat is adversarial distillation.

Many pundits also think that consumers don't want to pay a premium for small differences on the margin. That is very wrong-headed. I pay $200/month to a frontier lab because, even though it's only a few % higher in benchmark scores, it is 5x more useful on the margin.

svnt 4 days ago

It is the benchmark error rate, not the benchmark success %, that we actually trip up on.
Going from 85% to 90% is possibly 1/3 fewer errors or even higher, depending on the distribution of work you’re doing.
lelanthran 3 days ago
> The current requirements are more org-scale, more payoff from scale, more moat.
What moat? None of the AI providers have a moat at the moment, and the trend doesn't indicate that any of them will in the near future.
- energy123 3 days ago
  
  I made 2 posts in this thread regarding why I think they have a moat. Was there anything ambiguous or that you disagreed with?
  
  5 replies →
nick32661123 4 days ago
You pay to OpenAI or which one do you use? Do you switch regularly?
- energy123 4 days ago
  
  I pay OpenAI but I would also be a happy Anthropic customer.
  My view is that OpenAI, Anthropic and Google have a good moat. It's now an oligopolistic market with extreme barriers to entry due to needed scale. The moat will keep growing as the payoffs from scale keep growing. They have internal scale and scope economies as the breadth of synthetic data expands. The small differences between the labs now are the initial conditions that will magnify the differences later.
  It wouldn't be surprising to also see consolidation of the industry in the next 2 years which makes it even more difficult to compete, as 2 or 3 winners gobble up everyone and solidify their leads.
  When people worry about frontier lab's moat, they point to open weights models, which is really a commentary that these models have zero cost to replicate (like all software). But I think the era of open weights competition cannot be sustained, it's a temporary phenomenon tied to the middle-ground scale we're in where labs can still do that affordably. The absolute end of this will be the end-game of nation state backed competition.

PunchTornado 4 days ago

Funny. What lead? Gemini and Claude are much better.

nextlevelwizard 3 days ago
Yeah. I also do not see the lead. Claude Opus writes better code and for conversation all models even Le Chat is just better than ChatGPT currently.
- wiseowise 3 days ago
  
  > for conversation all models even Le Chat is just better than ChatGPT currently
  Not sure what you’re smoking, but I want some.
  
  10 replies →

hyperbovine 4 days ago

Agreed, compare the frontier models from Google and OAI. It’s like night and day. Anyone who says “the tech has caught up” has not spent even one day using Gemini 3.1 to try and accomplish something complicated.

PunchTornado 4 days ago

I think the vast majority of coding us done through claude and gemini