Comment by scarmig

2 years ago

Most engineers and researchers at big tech companies wouldn't intentionally do that. The bigger problem is that public evals leak into the training data. You can try to cleanse your training data, but at some point it's inevitable.

2 comments

scarmig

Racing0461 2 years ago

Yeah, i not saying it was intentional (misleading shareholders would be the worse crime here). Having these things in the training data without knowing due to how vast the dataset is is the issue.

FergusArgyll 2 years ago

> We filter our evaluation sets from our training corpus.
Page 5 of the report (they mention it again a little later)
https://storage.googleapis.com/deepmind-media/gemini/gemini_...