← Back to context

Comment by gherkinnn

4 hours ago

I never looked in to the details of these benchmarks, I live with the assumptions that most benchmarks of any kind are gamed and useless.

What I do see in my own work and that of others around me, is that Claude consistently outperforms Gemini and to a lesser extent Codex.

With Claude eating tokens with declining return, concessions have to be made and Codex is a usable middle ground.

I use Kimi in Kagi's Assistant for non-code or generic programming questions and am quite happy with its no-bullshit responses.