Comment by idiliv

2 years ago

I'm curious how they evaluated model quality. The only information I could find is "Quality: Index based on several quality benchmarks".

Quality index is equally-weighted normalized values of Chatbot Arena Elo Score, MMLU, and MT Bench.

We have a bit more information in the FAQ: https://artificialanalysis.ai/faq but thanks for the feedback, will look into expanding more on how the normalization works. We are thinking of ways to improve this generalized metric.

A sticking point is quality can of course be thought of from different perspectives, reasoning, knowledge (retrieval), use-case specific (coding, math, readability), etc. This is why show individual scores on home page and models page: https://artificialanalysis.ai/models