Comment by mmmllm

17 hours ago

The same week Oracle is forecasting huge data center demand and the stock is rallying. If these 10x gains in efficiency hold true then this could lead to a lot less demand for Nvidia, Oracle, Coreweave etc

https://en.wikipedia.org/wiki/Jevons_paradox

  • Sure but where is the demand going to come from? LLMs are already in every google search, in Whatsapp/Messenger, throughout Google workspace, Notion, Slack, etc. ChatGPT already has a billion users.

    Plus penetration is already very high in the areas where they are objectively useful: programming, customer care etc. I just don't see where the 100-1000x demand comes from to offset this. Would be happy to hear other views.

    • As plenty of others have mentioned here, if inference were 100x cheaper, I would run 200x inference.

      There are so many things you can do with long running, continuous inference.

      2 replies →

    • If LLMs were next to free and faster I would personally increase my consumption 100x or more and Im only the "programming" category.

    • We are nearly infinitely far away from saturating compute demand for inference.

      Case in point; I'd like something that realtime assesses all the sensors and API endpoints of stuff in my home and as needed bubbles up summaries, diaries, and emergency alerts. Right now that's probably a single H200, and well out of my "value range". The number of people in the world that do this now at scale is almost certainly less than 50k.

      If that inference cost went to 1%, then a) I'd be willing to pay it, and b) there'd be enough of a market that a company could make money integrating a bunch of tech into a simple deployable stack, and therefore c) a lot more people would want it, likely enough to drive more than 50k H200s worth of inference demand.

      6 replies →

    • > Plus penetration is already very high in the areas where they are objectively useful: programming, customer care etc.

      Is that true? BLS estimates of customer service reps in the US is 2.8M (https://www.bls.gov/oes/2023/may/oes434051.htm), and while I'll grant that's from 2023, I would wager a lot that the number is still above 2M. Similarly, the overwhelming majority of software developers haven't lost their jobs to AI.

      A sufficiently advanced LLM will be able to replace most, if not all of those people. Penetration into those areas is very low right now relative to where it could be.

      1 reply →

    • We've seen several orders of magnitude improvements in cpus over the years, yet you try to do anything now and interaction is often slower than that on zx spectrum. We can easily fill in order of magnitude improvement and that's only going to create more demand. We can/will have models thinking for us all the time, in parallel and bother us with findings/final solutions only. There is no limit here really.

    • I’m already throughput-capped on my output via Claude. If you gave me 10x the token/s I’d ship at least twice as much value (at good-enough for the business quality, to be clear).

      There are plenty of usecases where the models are not smart enough to solve the problem yet, but there is very obviously a lot of value available to be harvested from maturing and scaling out just the models we already have.

      Concretely, the $200/mo and $2k/ mo offerings will be adopted by more prosumer and professional users as the product experience becomes more mature.

    • The difference in usefulness between ChatGPT free and ChatGPT Pro is significant. Turning up compute for each embedded usage of LLM inference will be a valid path forward for years.

    • The problem is that unless you have efficiency improvements that radically alter the shape of the compute vs smartness curve, more efficient compute translates to much smarter compute at worse efficiency.

    • I mean 640KB should be enough for anyone too but here we are. Assuming LLMs fulfill the expected vision, they will be in everything and everywhere. Think about how much the internet has permeated everyday life. Even my freaking toothbrush has WiFi now! 1000x demand is likely several orders of magnitude too low in terms of the potential demand (again, assuming LLMs deliver on the promise).

I'm not going to speculate about what might be ahead in regards to Oracle's forecasting of data center demand, but regarding the idea of efficiency gains leading to lower demand, don't you think something like Jevons paradox might apply here?

People said the same thing for deepseek-r1, and nothing changed.

If you come up with a way to make the current generation of models 10x more efficient, then everyone just moves to train a 10x bigger model. There isn’t a size of model where the players are going to be satisfied at and not go 10x bigger. Not as long as scaling still pays off (and it does today).

Absolutely not; the trends have proven that people will just pay for the best quality they can get, and keep paying roughly the same money.

Every time a new model is released, people abandon the old, lower quality model (even when it’s priced less), and instead prefer to pay the same for a better model.

The same will happen with this.

  • Sure but the money people are paying right now isn't that much in the grand scheme of things. OpenAI is expecting 13bn in revenue this year. AWS made over 100bn last year. So unless they pay a lot more, or they find customers outside of programmers, designers, etc who are willing to pay for the best quality, I don't see how it grows as fast as it needs to (I'm not saying it won't increase, just not at the rate expected by the data center providers)

  • For early adopters yes but many systems have been running as good enough without any kind of updates for a long time. For many use cases it needs to get to a point where accuracy is good enough and then it will be set and forget. I disagree with the approach but that's what you find in the wild.

  • The best quality you can get is at odds with the best speed you can get. There are lots of people (especially with specific use cases) who will pay for the best speed they can get that is high enough quality.

If someone had to bet on an AI crash which I imagine would led to unused datacentres and cheap GPUs how would they invest their winnings to exploit these resources?

  • If the price of inference drops through the floor all the AI wrapper companies become instantly more valuable. Cursor is living on borrowed time because their agents suck and they're coasting on first mover advantage with weak products in general, but their position would get much better with cheap inference.

  • Buy the application layer near winners. When computing costs shrink, usage expands.

No. The gains in inference and training efficiency are going to be absorbed by frontier LLM labs being more willing to push more demanding and capable models to the end users, increase reasoning token budgets, etc.

For the last 2 years, despite all efficiency gains, I am literally watching characters appear on my screen, as if this was a hacker movie. Lately, I am also waiting for at least 60s for anything to appear at all.

If that happened at 10x the speed, I would still be slow in computer terms, and that increasingly matter, because I will not be the one reading the stuff – it will be other computers. I think looking back a few years from now, every single piece of silicon that is planned right will look like a laudable but laughable drop in the ocean.

The real quality demand needs is not there, so more processing is very probably needed, so efficiency gains may allow the extra processing.

(A string example read today of Real quality demand needs: the administration of Albania wants some sort of automated Cabinet Minister. Not just an impartial and incorruptible algorithm (what we normally try to do with deterministic computation): a "minister". Good luck with that.)