← Back to context

Comment by departed

10 months ago

LMArena might have some of the information you are looking for. It offers rankings of LLM models across main cloud offerings, and I feel that its evaluation method, human prompting and voting, is closer to real-world use case and less prone to data contamination than benchmarks.

https://lmarena.ai/

In the "Leaderboard">"Language" tab, it lists the top models in various categories such as overall, coding, math, and creative writing.

In the "Leaderboard">"Price Analysis" tab, it shows a chart comparing models by cost per million tokens.

In the "Prompt-to-Leaderboard" tab, there is even an LLM to help you find LLMs -- you enter a prompt, and it will find the top models for your particular prompt.