Comment by torginus
4 days ago
From my vague rememberance of doing data science years ago, it's very hard not to leak the training set.
Basically how you do RL is that you make a set of training examples of input-output pairs, and set aside a smaller validation set, which you never train on, to check if your model's doing well.
What you do is you tweak the architecture and the training set until it does well on the validation set. By doing so, you inadvertedly leak info about the training set. Perhaps you choose an architecture which does well on the validation set. Perhaps you train more on examples more like ones being validated.
Even without the explicit intent to cheat, it's very hard to avoid this contamination, if you chose a different validation set, you'd end up with a different model.
The questions were published a few days ago. The 2025 IMO just ended.
And the model was in lockdown to avoid this.