Comment by nimchimpsky

1 day ago

barely competitive ? Mythos column is the first column.

You are the only person with this take on hackernews, everyone else "this is a massive a jump". Fwiwi, the data you list shows the biggest jump I remember for mythos

5 comments

nimchimpsky

devmor 1 day ago

The biggest jump in the numbers they quoted is 6%.

Please look at the columns OTHER than Opus as well.

josephg 1 day ago
> Combined results (Claude Mythos / Claude Opus 4.6 / GPT-5.4 / Gemini 3.1 Pro)
> Terminal-Bench 2.0: 82.0% / 65.4% / 75.1% / 68.5%
> USAMO: 97.6% / 42.3% / 95.2% / 74.4%
> The biggest jump in the numbers they quoted is 6%.
Just in the numbers you quoted, thats a 16.6% jump in terminal-bench and a 55.3% absolute increase in USAMO over their previous Opus 4.6 model.
- devmor 1 day ago
  
  I don’t know if you’re willingly disregarding everything being said to you or there’s a language barrier here.
DroneBetter 13 hours ago

this just in: HN user forgets how sigmoid functions work
nl 1 day ago

It's higher than all other models except vs Gemini 3.1 Pro on MMMLU