Comment by optimalsolver

1 year ago

>OpenAI has data access to much but not all of the dataset

Their head mathematician says they have the full dataset, except a holdout set which they're currently developing (i.e. doesn't exist yet):

https://www.reddit.com/r/singularity/comments/1i4n0r5/commen...

5 comments

optimalsolver

menaerus 1 year ago

Thanks for the link. A holdout set which is yet to be used to verify the 25% claim. He also says that he doesn't believe that OpenAI would self-sabotage themselves by tricking the internal benchmarking performance since this will get easily exposed, either by the results from a holdout set or by the public repeating the benchmarks themselves. Seems reasonable to me.

optimalsolver 1 year ago
>the public repeating the benchmarks themselves
The public has no access to this benchmark.
In fact, everyone thought it was all locked up in a vault at Epoch AI HQ, but looks like Sam Altman has a copy on his bedside table.
- menaerus 1 year ago
  
  Perhaps what he meant is that the public will be able to benchmark the model themselves by throwing different difficulty math problems at it and not necessarily the FrontierMath benchmark. It should become pretty obvious if they were faking the results or not.
  
  2 replies →