← Back to context

Comment by numba888

6 months ago

if they used it in training it should be 100% hit. most likely they used it to verify and tune parameters.

> if they used it in training it should be 100% hit.

Not necessarily, no.

A statistical model will attempt to minimise overall loss, generally speaking.

If it gets 100% accuracy on the training data it's usually an overfit. (Hugging the data points too tightly, thereby failing to predict real life cases)

  • you are mostly right. but seeing almost perfectly reconstructed images from training set it's obvious model -can- memorize samples. in this case it would reproduce the answers too close to the original to be just 'accidental'. should be easy to test.

    My guess samples could be used to find good enough stopping point for o1, o3 models. which is hardcoded.

    • The subtlety here is that an almost-memorized picture of a lady is the same picture with a few artifacts, and an almost-memorized NYT article is the same article with a few words changed, but an almost-memorized computation or proof is likely to be plain wrong. So even if OpenAI's benchmark was data contamination (as I suspect) it still says something about o1's abilities to execute a given problem-solving strategy without confabulating. It's just not what OpenAI wants you to think: much closer to Mathematica than an actual mathematician.

      1 reply →

Had they let it hit 100% it would have been obvious they had the data.

They've sure been careful to avoid that, by only using a portion of it or some other technique