Comment by departed
4 days ago
LMArena might have some of the information you are looking for. It offers rankings of LLM models across main cloud offerings, and I feel that its evaluation method, human prompting and voting, is closer to real-world use case and less prone to data contamination than benchmarks.
In the "Leaderboard">"Language" tab, it lists the top models in various categories such as overall, coding, math, and creative writing.
In the "Leaderboard">"Price Analysis" tab, it shows a chart comparing models by cost per million tokens.
In the "Prompt-to-Leaderboard" tab, there is even an LLM to help you find LLMs -- you enter a prompt, and it will find the top models for your particular prompt.
No comments yet
Contribute on Hacker News ↗