Comment by eugenekolo
19 hours ago
Without SWE-Bench though, how will AI models properly game their results to show ~5-10% gain each iteration?
Once a benchmark is known and there's billion of dollars on the line, obviously every company will game them.
No comments yet
Contribute on Hacker News ↗