← Back to context

Comment by dylan604

9 days ago

One of these things is not like the others. $8.50/1000?? Any chance that's a typo? Otherwise, for someone that has no experience with LLM pricing models, why is Llama 90b so expensive?

It's not uncommon when using brokers to see outliers like this. What happens basically is that some models are very popular and have many different providers, and are priced "close to the metal" since the routing will normally pick the cheapest option with the specified requirements (like context size). But then other models - typically more specialized ones - are only hosted by a single provider, and said provider can then price it much higher than raw compute cost.

E.g. if you look at https://openrouter.ai/models?order=pricing-high-to-low, you'll see that there are some 7B and 8B models that are more expensive than Claude Sonnet 3.7.

  • I'll add that some, big-name suppliers with big models might be running at or near a loss on purpose to draw in customers. That behavior is often encouraged by funders who gave them over $100 million to capture the market.

    Their theory is they can raise prices once their competitors go out of business. The companies open-sourcing pretrained models are countering that. So, we see a mix of huge models underpriced by scheming companies and open-source models priced for inference with free market principles.

That was the cost when we ran Llama 90b using TogetherAI. But it's quite hard to standardize, since it depends a lot on who is hosting the model (i.e. together, openrouter, grok, etc.)

I think in order to run a proper cost comparison, we would need to run each model on an AWS gpu instance and compare the runtime required.