Comment by nimchimpsky
1 day ago
barely competitive ? Mythos column is the first column.
You are the only person with this take on hackernews, everyone else "this is a massive a jump". Fwiwi, the data you list shows the biggest jump I remember for mythos
The biggest jump in the numbers they quoted is 6%.
Please look at the columns OTHER than Opus as well.
> Combined results (Claude Mythos / Claude Opus 4.6 / GPT-5.4 / Gemini 3.1 Pro)
> Terminal-Bench 2.0: 82.0% / 65.4% / 75.1% / 68.5%
> USAMO: 97.6% / 42.3% / 95.2% / 74.4%
> The biggest jump in the numbers they quoted is 6%.
Just in the numbers you quoted, thats a 16.6% jump in terminal-bench and a 55.3% absolute increase in USAMO over their previous Opus 4.6 model.
I don’t know if you’re willingly disregarding everything being said to you or there’s a language barrier here.
this just in: HN user forgets how sigmoid functions work
It's higher than all other models except vs Gemini 3.1 Pro on MMMLU