Comment by Dzugaru
12 hours ago
Most critical piece of information I couldn’t find is - how many shot was this?
Could it understand the solution is correct by itself (one-shot)? Or did it have just great math intuition and knowledge? How the solutions were validated if it was 10-100 shot?
The solutions were evaluated on their submitted output. You're allowed to use multiple 'shots' to produce the output, but just one submission per question. People are allowed this same affordance.
But were humans involved in picking which answer/shot to submit, or was it only AI?