Comment by vanuatu
5 days ago
i worked on one of the benchmarks typically found in new model releases
this benchmark looks very good from the methodology. a cog researcher checking the data themselves is very high signal (not scaleable so don't take the benchmark as gospel, but directionally good)
No comments yet
Contribute on Hacker News ↗