Comment by stingraycharles
13 hours ago
“no harnass at all” might be an issue, though, as these types of benchmarks are often gamified and then models perform great on them without actually being better models.
13 hours ago
“no harnass at all” might be an issue, though, as these types of benchmarks are often gamified and then models perform great on them without actually being better models.
No comments yet
Contribute on Hacker News ↗