Comment by hyperbovine

1 year ago

Is that really so surprising given what we know about how these models actually work? I feel vindicated on behalf of myself and all the other commenters who have been mercilessly downvoted over the past three years for pointing out the obvious fact that next token prediction != reasoning.

4 comments

hyperbovine

aoeusnth1 1 year ago

2.5 pro scores 25%.

It’s just a much harder math benchmark which will fall by the end of next year just like all the others. You won’t be vindicated.

hyperbovine 1 year ago
Bold claim! Let's see what that 25% is. I guarantee it is the portion of the exam which is trivially answerable if you have a stored database of all previous math exams ever written to consult.
- aoeusnth1 1 year ago
  
  There is 0% of the exam which is trivially answerable.
  The entire point of USAMO problems is that they demand novel insight and rigorous, original proofs. They are intentionally designed not to be variations of things you can just look up. You have to reason your way through, step by logical step.
  Getting 25% (~11 points) is exceptionally difficult. That often means fully solving one problem and maybe getting solid partial credit on another. The median score is often in the single digits.
  
  1 reply →