Comment by InkCanon

10 months ago

That's fair. But look up the recent experiment on SOTA models on the then just released USAMO 2025 questions. Highest score was 5%, supposedly SOTA last year was IMO silver level. There could be some methodological differences - ie USAMO paper required correct proofs and not just numerical answers. But it really strongly suggests even within limited domains, it's cheating. I'd wager a significant amount that if you tested SOTA models on a new ICPC set of questions, actual performance would be far, far worse than their supposed benchmarks.

1 comment

InkCanon

usaar333 10 months ago

> Highest score was 5%, supposedly SOTA last year was IMO silver level.

No LLM last year got silver. Deepmind had a highly specialized AI system earning that