Comment by pplonski86

10 hours ago

There are so many models, is there any website with list of all of them and comparison of performance on different tasks?

The post actually has great benchmark tables inside of it. They might be outdated in a few months, but for now, it gives you a great summary. Seems like Gemini wins on image and video perf, Claude is the best at coding, ChatGPT is the best for general knowledge.

But ultimately, you need to try them yourself on the tasks you care about and just see. My personal experience is that right now, Gemini Pro performs the best at everything I throw at it. I think it's superior to Claude and all of the OSS models by a small margin, even for things like coding.

  • I like Gemini Pro's UI over Claude so much but honestly I might start using Kimi K2.5 if its open source & just +/- Gemini Pro/Chatgpt/Claude because at that point I feel like the results are negligible and we are getting SOTA open source models again.

    • > honestly I might start using Kimi K2.5 if its open source & just +/- Gemini Pro/Chatgpt/Claude because at that point I feel like the results are negligible and we are getting SOTA open source models again.

      Me too!

      > I like Gemini Pro's UI over Claude so much

      This I don't understand. I mean, I don't see a lot of difference in both UIs. Quite the opposite, apart from some animations, round corners and color gradings, they seem to look very alike, no?

      1 reply →

There is https://artificialanalysis.ai

  • There are many lists, but I find all of them outdated or containing wrong information or missing the actual benchmarks I'm looking for.

    I was thinking, that maybe it's better to make my own benchmarks with the questions/things I'm interested in, and whenever a new model comes out run those tests with that model using open-router.