Comment by energy123

1 year ago

If OpenAI wanted the questions/solutions, there is going to be a reason for that. This data is not sitting in an unopened folder on Sam's computer.

There are a lot of ways you can use data to improve a model without directly training on it. A train/test validation loop, for example. Or as a wellspring for synthetic data generation. But all of these ways involve some level of data contamination, it's unavoidable.