← Back to context

Comment by rrr_oh_man

6 months ago

> if they used it in training it should be 100% hit.

Not necessarily, no.

A statistical model will attempt to minimise overall loss, generally speaking.

If it gets 100% accuracy on the training data it's usually an overfit. (Hugging the data points too tightly, thereby failing to predict real life cases)

you are mostly right. but seeing almost perfectly reconstructed images from training set it's obvious model -can- memorize samples. in this case it would reproduce the answers too close to the original to be just 'accidental'. should be easy to test.

My guess samples could be used to find good enough stopping point for o1, o3 models. which is hardcoded.

  • The subtlety here is that an almost-memorized picture of a lady is the same picture with a few artifacts, and an almost-memorized NYT article is the same article with a few words changed, but an almost-memorized computation or proof is likely to be plain wrong. So even if OpenAI's benchmark was data contamination (as I suspect) it still says something about o1's abilities to execute a given problem-solving strategy without confabulating. It's just not what OpenAI wants you to think: much closer to Mathematica than an actual mathematician.

    • > but an almost-memorized computation or proof is likely to be plain wrong

      hard to tell. never seen anyone trying it. model may almost-memorize and then fill the gaps at inference time as it's still doing some 'thinking'. But the main idea here is that there is a risk that model will spill out pieces of training data. OAI likely would not risk it at $100B++ valuation.