Slacker News Slacker News logo featuring a lazy sloth with a folded newspaper hat
  • top
  • new
  • show
  • ask
  • jobs
Library
← Back to context

Comment by PunchTornado

3 days ago

what? nobody looks at those benchmarks, you use whatever works for your task, in most cases either gemini or claude. those benchmarks don't mean anything as models overfit on them.

1 comment

PunchTornado

Reply

esafak  3 days ago

Come on, the benchmarks do mean something, even if companies overfit them. Models are indisputably improving together with their benchmark scores.

Slacker News

Product

  • API Reference
  • Hacker News RSS
  • Source on GitHub

Community

  • Support Ukraine
  • Equal Justice Initiative
  • GiveWell Charities