Comment by jzymbaluk
4 months ago
How ironic that these LLM's appear to be overfitting to the benchmark scores. Presumably these researchers deal with overfitting every day, but can't recognize it right in front of them
4 months ago
How ironic that these LLM's appear to be overfitting to the benchmark scores. Presumably these researchers deal with overfitting every day, but can't recognize it right in front of them
I'm sure they all know it's happening. But the incentives are all misaligned. They get promotions and raises for pushing the frontier which means showing SOTA performance on benchmarks.