← Back to context

Comment by ares623

14 hours ago

How are we so confident that prices will fall? Isn't the exact opposite happening, right now, during arguably the most critical part of this whole saga (pre-IPO to make things appear as beautiful and as not-obviously-illegal as possible)? And the only reason they were "falling" previously was for hyper growth.

The Growth aspect mentioned is that VCs are subsidizing the bill right now, so it is hard to know if at the current moment the demand curve would promote as much usage without it, but assuming demand remained constant (not even growing), you could expect token prices to be competed down. It is a commodity without a moat.

Now that we have pretty decent open source models, anyone can create a new business to supply more tokens. Sure there’s short term scarcity: energy, GPUs, cooling, but this is a scale up problem. More token demand = more data center build = more energy plant build. This downward pressure will also keep frontier private model prices in check.

Differentiation seems to be happening at the harness level, whereby we can expect token spend to be a metric to compete on and drive down for the customer (at least hoping tools in the application space don’t continue token based billing as their primary revenue stream).

These are not short term hyper growth forces, but a fundamental alignment of incentives.

Pricing on SToA models probably won’t fall, there’s no reason for the frontier labs to lower their prices.

But we’re seeing lots of open weight models that are either pretty close to SToA, or more importantly, perfectly capable of doing all the low level token insensitive grunt work when writing code. Pairing them with SToA models for long horizon task management, and you’ve got a very cost effective system.

The frontier labs have put little effort into cost efficient inference, they don’t need to, but folks like DeepSeek clearly are, and have achieved some impressive cost improvements. Given DeepSeeks models give you 70% of the capabilities for 30% of the cost, expect people to start moving lots of workloads to providers that provide cheap inference for open models, and huge competition to appear to provide that cheap inference. It’s truly commodity LLM inference.

In turn expect more companies to focus on building inferences efficient models, because someone that can build a model that provides 70% of SToA capabilities for 10% of the token cost, immediately eats up huge amounts of the available inference market.

Another factor in all this, is it’s becoming increasingly clear that building custom agents/workflows for LLM to operate in, is required to get the best out of these models. That means people are implicitly building the infra needed to use multiple model types and evaluate workflow performance end-to-end. Which in turn means they have everything they need to plugin in future, cheaper, inference providers and quickly evaluate if they can change their model provider.

In the one direction the hardware continues to improve, new buildouts continue to come online, and methods for improving the parameter efficiency of models continue to be discovered.

In the other direction models continue to grow larger, new customers continue to arrive, and existing customers continue to find ever more creative ways to burn large quantities of tokens as the prices fall.

I doubt anyone can say with certainty where the equilibrium will be 1 or 5 years from now largely because (among many other things) it's impossible to predict how much of the current economy AI will end up eating. In general though the third party providers of open weights models are probably the most reliable data source available since they have little to no incentive to subsidize usage.

I don’t think we can extrapolate from current API pricing, but dramatically improving hardware in terms of cost:performance is the underlying reality.

Betting against that you need to assume exponentially more expensive models every year.

it is falling if you look elsewhere, deepseek made their 75% discount on their V4 models permanent, on one hand there's LLM improvements that make inference cheaper (e.i. MoE, hybrid attention), on the other hand we're getting more inference focused chips that break the nvidia monopoly.

i don't think a lot of people know this, but a cluster of GPUs can serve multiple clients without much of a drop in performance, e.i. worst case scenario you band together with 6-16 people to run a 2-3 H100 server to host deepseek V4 Flash or 4-6 to run Pro, and you're getting the same performance as if you ran it alone, this means a lot of companies can afford throwing 50-100k into their own LLM server cluster.

We're at a price point where if you push it further people will move, there's no real vendor lock in, your agent config, skills, MCP servers etc are all reusable with other models and harnesses, so unless you get all providers to collude on a price hike, you risk an exodus of customers

Prices have fallen dramatically over the last few years. It’s just that our standards have increased because we are using AI in ways that were not possible with worse models. But for the same level of “intelligence” as we had a couple years ago, the prices are so much lower.