Comment by westurner
12 hours ago
What do other models trained on the same problems score? What about if they are RL'd to not reproduce things word for word?
Why do you think that the 2024 Putnam programs that they used to test were in the training data?
/? "Art of Problem Solving" Putnam https://www.google.com/search?q=%22Art+of+Problem+Solving%22...
From p.3 of the PDF:
> Curating Cold Start RL Data: We constructed our initial training data through the following process:
> 1. We crawled problems from Art of Problem Solving (AoPS) contests , prioritizing math olympiads, team selection tests, and post-2010 problems explicitly requiring proofs, total- ing 17,503 problems.
> Why do you think that the 2024 Putnam programs that they used to test were in the training data?
They reference https://artofproblemsolving.com/community/c13_contest_collec... for the source of their scrape and the Putnam problems are on that page under 'Undergraduate Contests'.
> Why do you think that the 2024 Putnam programs that they used to test were in the training data?
Putnam solutions can be found multiple places online: https://kskedlaya.org/putnam-archive/, https://artofproblemsolving.com/community/c3249_putnam. These could have appeared in the training of the base LLM DeepSeek-V3.2-Exp or as problems in the training set - they do not give further detail on what problems they selected from AOPS and as the second link gives they are there.