← Back to context

Comment by Ocha

3 days ago

Nobody believes Elon anymore.

Hm, impartial benchmarks are independent of Elon's claims?

  • Impartial benchmarks are great, unless (1) you have so many to choose from that you can game them (which is still true even if the benchmark makers themselves are absolutely beyond reproach), or (2) there's a difference between what you're testing and what you care about.

    Goodhart's Law means 2 is approximately always true.

    As it happens, we also have a lot of AI benchmarks to choose from.

    Unfortunately this means every model basically has a vibe score right now, as the real independent tests are rapidly saturated into the "ooh shiny" region of the graph. Even the people working on e.g. the ARC-AGI benchmark don't think their own test is the last word.

  • "impartial" how? Do you have the training data, are you auditing to make sure they're not few-shotting the benchmarks?