Comment by rsanheim

10 months ago

`ETOOMANYMODELS`

Is there a reputable, non-blogspam site that offers a 'cheat sheet' of sorts for what models to use, in particular for development? Not just openAI, but across the main cloud offerings and feasible local models?

I know there are the benchmarks, and directories like huggingface, and you can get a 'feel' for things by scanning threads here or other forums.

I'm thinking more of something that provides use-case tailored "top 3" choices by collecting and summarizing different data points. For example:

* agent & tool based dev (cloud) - [top 3 models] * agent & tool based dev (local) - m1, m2, m,3 * code review / high level analysis - ... * general tech questions - ... * technical writing (ADRs, needs assessments, etc) - ...

Part of the problem is how quickly the landscape changes everyday, and also just relying on benchmarks isn't enough: it ignores cost, and more importantly ignores actual user experience (which I realize is incredibly hard to aggregate & quantify).

3 comments

rsanheim

departed 10 months ago

LMArena might have some of the information you are looking for. It offers rankings of LLM models across main cloud offerings, and I feel that its evaluation method, human prompting and voting, is closer to real-world use case and less prone to data contamination than benchmarks.

https://lmarena.ai/

In the "Leaderboard">"Language" tab, it lists the top models in various categories such as overall, coding, math, and creative writing.

In the "Leaderboard">"Price Analysis" tab, it shows a chart comparing models by cost per million tokens.

In the "Prompt-to-Leaderboard" tab, there is even an LLM to help you find LLMs -- you enter a prompt, and it will find the top models for your particular prompt.

ac29 10 months ago

> Is there a reputable, non-blogspam site that offers a 'cheat sheet' of sorts for what models to use, in particular for development?

Below is a spreadsheet I bookmarked from a previous HN discussion. Its information dense but you can just look at the composite scores to get a quick idea how things compare.

https://docs.google.com/spreadsheets/u/1/d/1foc98Jtbi0-GUsNy...

Carbonhell 10 months ago

I have been using this site: https://artificialanalysis.ai/ . It's still about benchmarks, and it doesn't do deep dives into specific use cases, but it's helpful to compare models for intelligence vs cost vs latency and other characteristics.