← Back to context

Comment by iLoveOncall

2 months ago

Hasn't it been proven many times that all those companies cheat on benchmarks?

I personally couldn't care less about them, especially when we've seen many times that the public's perception is absolutely not tied to the benchmarks (Llama 4, the recent OpenAI model that flopped, etc.).

I don't think there's any real evidence that any of the major companies are going out of their way to cheat the benchmarks. Problem is that, unless you put a lot of effort into avoiding contamination, you will inevitably end up with details about the benchmark in the training set.