← Back to context

Comment by mbesto

9 days ago

I wanna believe everything you say (because you generally are a credible person) but a few things don't add up:

1. Weakest ever LLM? This one is really making me scratch my head. For a period of time Llama was considered to THE best. Furthermore, it's the third most used on OpenRouter (in the past month): https://openrouter.ai/rankings?view=month

2. Ignoring DeepSeek for a moment, Llama 2 and 3 require a special license from Meta if the products or services using the models have more than 700 million monthly active users. OpenAI, Claude and Gemini are not only closed source, but require a license/subscription to even get started.

I've found the llama 3 served by meta.ai to be quite weak for coding prompts, it gets confused by more complex tasks. Maybe its a smaller model? I agree it's weaker than others of its generation.

Doesn't OpenRouter ranking include pricing?

Not really a good measure of quality or performance but of cost effectiveness

  • I mean it literally says on the page:

    "Shown are the sum of prompt and completion tokens per model, normalized using the GPT-4 tokenizer."

    Also, it ranks the use of Llama that is provided by cloud providers (for example, AWS Lamda).

    I get that OpenRouter is imperfect but its a good proxy to objectively make a claim that an LLM is "the weakest ever"