← Back to context

Comment by lolinder

6 months ago

They have an oral agreement that OpenAI won't use the benchmark in training. Which means first and foremost you have to consider the possibility that they broke that oral agreement and actually included the problems in the training set. Even if they didn't, the fact that they had the problems means they could have selectively chosen the training set data to specialize in solving that class of problem, while still technically keeping the verbal agreement.

So, yeah, the benchmark needs to be treated as essentially worthless at this point.

If OpenAI wanted the questions/solutions, there is going to be a reason for that. This data is not sitting in an unopened folder on Sam's computer.

There are a lot of ways you can use data to improve a model without directly training on it. A train/test validation loop, for example. Or as a wellspring for synthetic data generation. But all of these ways involve some level of data contamination, it's unavoidable.