Comment by syntaxing

1 month ago

Hacker News strongly believes Opus 4.5 is the defacto standard and China was consistently 8+ month behind. Curious how this performs. It’ll be a big inflection point if it performs as well as its benchmarks.

11 comments

syntaxing

Flavius 1 month ago

Based on their own published benchmarks, it appears that this model is at least 6 months behind.

spwa4 1 month ago
Strange how things evolve. When ChatGPT started it had about 2 years headstart over Google's best proprietary model, and more than 2 years ahead to open source models.
Now they have to be lucky to be 6 months ahead to an open model with at most half the parameter count, trained on 1%-2% the hardware US models are trained on.
- rglullis 1 month ago
  
  And more than that, the need for people/business to pay the premium for SOTA getting smaller and smaller.
  I thought that OpenAI was doomed the moment that Zuckerberg showed he was serious about commoditizing LLM. Even if llama wasn't the GPT killer, it showed that there was no secret formula and that OpenAI had no moat.
  
  1 reply →
- rbtprograms 1 month ago
  
  it seems they believed that superior models would be the moat, but when deepseek essentially replicated o1 they switched to the ecosystem as the moat.
- DeathArrow 1 month ago
  
  >Now they have to be lucky to be 6 months ahead to an open model with at most half the parameter count, trained on 1%-2% the hardware US models are trained on.
  Maybe there's a limit in training and throwing more hardware at it does very little improvement?

oersted 1 month ago

In my experience GPT-5.2 with extra-high thinking is consistently a bit better and significantly cheaper (even when I use the Fast version which is 2x the price in Cursor).

The HN obsession with Claude Code might be a bit biased by people trying to justify their expensive subscriptions to themselves.

However, Opus 4.5 is much faster and very high quality too, and that ends up mattering more in practice. I end up using it much more and paying a dear but worthwhile price for it.

PS: Despite what the benchmarks say, I find Gemini 3 Pro and Flash to be a step below Claude and GPT, although still great compared to the state-of-the-art last year, and very fast and cheap. Gemini also seems to have a less AI sounding writing-style.

I am aware this is all quite vague and anecdotal, just my two cents.

I do think these kinds of opinions are valuable. Benchmarks are a useful reference, but they do give the illusion of certainty to something that is fundamentally much harder to measure and quite subjective.

manmal 1 month ago

Better, yes, but cheaper - only when looking at API costs I guess? Who in their right mind uses the API instead of the subsidized plans? There, Opus is way cheaper in terms of subsidized tokens.
sandos 1 month ago

Iv'e been using GPT-5.1, 5.1-codex and 5.1-codex-max and gpt-5.2 the last few weeks. Then I got tipped off about opus, and that it was supposed to be awesome. The problem is I can clearly see old patterns of "Oooh, I found the issue!" in the middle of the stream long before it has found the real issue I was asking about, and not very good results. The GPT family to me is better.
I was especially impressed by 5.1-codex-max for a webapp, but that is ofc where these model in general shine. But it was freak, never had 15-20 iterations (with 100s of lines added each time) before where I did not have to correct anything.
anonzzzies 1 month ago

You are using opus via api? 200$/mo is nothing for what I get for it so not sure how it is considered expensive. I guess it is how you it; I hit the limits every day. Using the API, I would indeed be paying through the nose but why would anyone?
keyle 1 month ago

My experience exactly.