Comment by energy123
6 months ago
If OpenAI wanted the questions/solutions, there is going to be a reason for that. This data is not sitting in an unopened folder on Sam's computer.
There are a lot of ways you can use data to improve a model without directly training on it. A train/test validation loop, for example. Or as a wellspring for synthetic data generation. But all of these ways involve some level of data contamination, it's unavoidable.
No comments yet
Contribute on Hacker News ↗