Comment by Racing0461
2 years ago
How do we know the model wans't pretrained on the evaluations to get higher scores? In general but especially for profit seeking corporations, this measure might become a target and become artificial.
2 years ago
How do we know the model wans't pretrained on the evaluations to get higher scores? In general but especially for profit seeking corporations, this measure might become a target and become artificial.
Most engineers and researchers at big tech companies wouldn't intentionally do that. The bigger problem is that public evals leak into the training data. You can try to cleanse your training data, but at some point it's inevitable.
Yeah, i not saying it was intentional (misleading shareholders would be the worse crime here). Having these things in the training data without knowing due to how vast the dataset is is the issue.
> We filter our evaluation sets from our training corpus.
Page 5 of the report (they mention it again a little later)
https://storage.googleapis.com/deepmind-media/gemini/gemini_...