Comment by EagnaIonat
6 months ago
> What's even more suspicious is that these tweets from Elliot Glazer indicate that they are still "developing" the hold-out set,
There is nothing suspicious about this and the wording seems to be incorrect.
A hold-out set is a percentage of the overall data that is used to test a model. It is just not trained on it. Model developers normally have full access to it.
There is nothing inherently wrong with training on a full/partial hold out set. It just means you have done a different split to train again.
The confusion I see here is that people are equating a hold out set to a blind set. That's a set of data to test against that the model developers (and model) cannot see.
Even so blind sets can also go stale after a few runs and nothing is wrong with ingesting that blind set, as long as you have a new blind set to run against.
Trying to game blind set tests is nothing new and it gets very quickly found out.
What I took from the original article is that the blind set is likely unbalanced and it answered more easier questions than hard ones.
> The confusion I see here is that people are equating a hold out set to a blind set. That's a set of data to test against that the model developers (and model) cannot see.
What on earth? This is from Tamay Besiroglu at Epoch:
So this "confusion" is because Epoch AI specifically told people it was a blind set! Despite the condescending tone, your comment is just plain wrong.
Your quote literally says hold-out set.
It also literally says "unseen-by-OpenAI".
1 reply →