Comment by gherkinnn
5 hours ago
I never looked in to the details of these benchmarks, I live with the assumptions that most benchmarks of any kind are gamed and useless.
What I do see in my own work and that of others around me, is that Claude consistently outperforms Gemini and to a lesser extent Codex.
With Claude eating tokens with declining return, concessions have to be made and Codex is a usable middle ground.
I use Kimi in Kagi's Assistant for non-code or generic programming questions and am quite happy with its no-bullshit responses.
No comments yet
Contribute on Hacker News ↗