Comment by departed

1 year ago

LMArena might have some of the information you are looking for. It offers rankings of LLM models across main cloud offerings, and I feel that its evaluation method, human prompting and voting, is closer to real-world use case and less prone to data contamination than benchmarks.

https://lmarena.ai/

In the "Leaderboard">"Language" tab, it lists the top models in various categories such as overall, coding, math, and creative writing.

In the "Leaderboard">"Price Analysis" tab, it shows a chart comparing models by cost per million tokens.

In the "Prompt-to-Leaderboard" tab, there is even an LLM to help you find LLMs -- you enter a prompt, and it will find the top models for your particular prompt.

0 comments

departed

No comments yet

Contribute on Hacker News ↗