Comment by vonneumannstan
15 hours ago
>it seems that the answer to whether or not a general model could perform such a feat is that the models were trained specifically on IMO problems, which is what a number of folks expected.
Not sure thats exactly what that means. Its already likely the case that these models contained IMO problems and solutions from pretraining. It's possible this means they were present in the system prompt or something similar.
Does the IMO reuse problems? My understanding is that new problems are submitted each year and 6 are selected for each competition. The submitted problems are then published after the IMO has concluded. How would the training data contain unpublished, newly submitted problems?
Obviously the training data contained similar problems, because that's what every IMO participant already studies. It seems unlikely that they had access to the same problems though.
IMO doesn't reuse problems, but Terence Tao has a Mastodon post where he explains that the first five (of six) problems are generally ones where existing techniques can be leveraged to get to the answer. The sixth problem requires considerable originality. Notably, both Gemini and OpenAI's model didn't get the sixth problem. Still quite an achievement though.
Do you have another source for that? I checked his Mastodon feed and don't see any mention about the source of the questions from the IMO.
https://mathstodon.xyz/@tao
strange statement--it's not true in general for sure (3&6 typically hardest but they certainly aren't fundamentally of a different nature to other questions) this year P6 seemed to be by far the hardest though but this posthoc statement should be read cautiously
>How would the training data contain unpublished, newly submitted problems?
I don't think I or op suggested it did.
Or that they did significant retraining to boost IMO performance creating a more specialized model at the cost of general-purpose performance.