Comment by aithrowawaycomm

6 months ago

The subtlety here is that an almost-memorized picture of a lady is the same picture with a few artifacts, and an almost-memorized NYT article is the same article with a few words changed, but an almost-memorized computation or proof is likely to be plain wrong. So even if OpenAI's benchmark was data contamination (as I suspect) it still says something about o1's abilities to execute a given problem-solving strategy without confabulating. It's just not what OpenAI wants you to think: much closer to Mathematica than an actual mathematician.

1 comment

aithrowawaycomm

numba888 6 months ago

> but an almost-memorized computation or proof is likely to be plain wrong

hard to tell. never seen anyone trying it. model may almost-memorize and then fill the gaps at inference time as it's still doing some 'thinking'. But the main idea here is that there is a risk that model will spill out pieces of training data. OAI likely would not risk it at $100B++ valuation.