← Back to context

Comment by InkCanon

10 months ago

That's fair. But look up the recent experiment on SOTA models on the then just released USAMO 2025 questions. Highest score was 5%, supposedly SOTA last year was IMO silver level. There could be some methodological differences - ie USAMO paper required correct proofs and not just numerical answers. But it really strongly suggests even within limited domains, it's cheating. I'd wager a significant amount that if you tested SOTA models on a new ICPC set of questions, actual performance would be far, far worse than their supposed benchmarks.

> Highest score was 5%, supposedly SOTA last year was IMO silver level.

No LLM last year got silver. Deepmind had a highly specialized AI system earning that