Comment by pplonski86

11 days ago

There are so many models, is there any website with list of all of them and comparison of performance on different tasks?

7 comments

pplonski86

Reubend 11 days ago

The post actually has great benchmark tables inside of it. They might be outdated in a few months, but for now, it gives you a great summary. Seems like Gemini wins on image and video perf, Claude is the best at coding, ChatGPT is the best for general knowledge.

But ultimately, you need to try them yourself on the tasks you care about and just see. My personal experience is that right now, Gemini Pro performs the best at everything I throw at it. I think it's superior to Claude and all of the OSS models by a small margin, even for things like coding.

Imustaskforhelp 11 days ago
I like Gemini Pro's UI over Claude so much but honestly I might start using Kimi K2.5 if its open source & just +/- Gemini Pro/Chatgpt/Claude because at that point I feel like the results are negligible and we are getting SOTA open source models again.
- wobfan 11 days ago
  
  > honestly I might start using Kimi K2.5 if its open source & just +/- Gemini Pro/Chatgpt/Claude because at that point I feel like the results are negligible and we are getting SOTA open source models again.
  Me too!
  > I like Gemini Pro's UI over Claude so much
  This I don't understand. I mean, I don't see a lot of difference in both UIs. Quite the opposite, apart from some animations, round corners and color gradings, they seem to look very alike, no?
  
  1 reply →

coffeeri 11 days ago

There is https://artificialanalysis.ai

XCSme 11 days ago

There are many lists, but I find all of them outdated or containing wrong information or missing the actual benchmarks I'm looking for.
I was thinking, that maybe it's better to make my own benchmarks with the questions/things I'm interested in, and whenever a new model comes out run those tests with that model using open-router.
pplonski86 11 days ago

Thank you! Exactly what I was looking for