Comment by kalkin
3 months ago
Scale AI wrote a paper a year ago comparing various models performance on benchmarks to performance on similar but held-out questions. Generally the closed source models performed better, and Mistral came out looking pretty badly: https://arxiv.org/pdf/2405.00332
No comments yet
Contribute on Hacker News ↗