Comment by bigstrat2003

6 months ago

Which is why I don't trust any of the benchmarks LLM enthusiasts point to when they say "see the model is getting better". I have zero confidence that the AI companies are trying to make the system better, rather than using the measure as a target.

1 comment

bigstrat2003

SpaceNoodled 6 months ago

That reminds me of the time I found thread-safety-breaking changes in Intel's custom Android framework that were clearly designed to cheat benchmarks.