← Back to context

Comment by boring-human

8 hours ago

Yep. I think the idea that the benchmark is determinative is just as deluded as the notion that it should be unbreakable.

Benchmarks are on the honor system. Even the tightest benchmark can be cheated. If the benchmark is so secret and air-gapped that it can't be cheated by models, it can be cheated by its own authors. You can't use benchmarks to gate out cheating.

If you don't have the honor system in mind when you're reading scores, you're wasting your time. Is it some unknown outfit with wild claims? Is it connected to Epstein, Russia, the real estate "industry", or sleazeballing in general? Do they have previous history of ratgaming the numbers? Replace its scores with asterisks and move on.