Comment by avianlyric

8 hours ago

Pricing on SToA models probably won’t fall, there’s no reason for the frontier labs to lower their prices.

But we’re seeing lots of open weight models that are either pretty close to SToA, or more importantly, perfectly capable of doing all the low level token insensitive grunt work when writing code. Pairing them with SToA models for long horizon task management, and you’ve got a very cost effective system.

The frontier labs have put little effort into cost efficient inference, they don’t need to, but folks like DeepSeek clearly are, and have achieved some impressive cost improvements. Given DeepSeeks models give you 70% of the capabilities for 30% of the cost, expect people to start moving lots of workloads to providers that provide cheap inference for open models, and huge competition to appear to provide that cheap inference. It’s truly commodity LLM inference.

In turn expect more companies to focus on building inferences efficient models, because someone that can build a model that provides 70% of SToA capabilities for 10% of the token cost, immediately eats up huge amounts of the available inference market.

Another factor in all this, is it’s becoming increasingly clear that building custom agents/workflows for LLM to operate in, is required to get the best out of these models. That means people are implicitly building the infra needed to use multiple model types and evaluate workflow performance end-to-end. Which in turn means they have everything they need to plugin in future, cheaper, inference providers and quickly evaluate if they can change their model provider.

0 comments

avianlyric

No comments yet

Contribute on Hacker News ↗