← Back to context

Comment by chvid

5 months ago

They leak the second they are used on a model behind an API, don't they?

As far as I can tell the only way of doing a comparison of two models, that cannot be easily gamed, is being having them in open weights form and then running them against a benchmark that was created after both of the two models were created.