← Back to context

Comment by spmurrayzzz

4 hours ago

He has also consistently demonstrated, at least to me, that he doesn't really understand how inference works from a technical perspective, which weakens much of his core thesis for why there should be a collapse.

I do value having some naysayers in the mix generally, because we do need balanced critique in what is otherwise a very frothy hype cycle. I just don't think he's making sound arguments, and that's even assuming you even agree with his premises in the first place.

My biggest gripe with his napkin math is that he treats inference gross margins as something novel that you can't compare to normal SaaS margins. He's right in part: the constant carousel of R&D costs from model training, related infrastructure buildout, and other adjacent costs required to stay competitive do change the analysis a bit.

But he takes this way too far when he says this is structurally different from normal SaaS margins. The business model definitely doesn't look like Dropbox, but it absolutely looks a lot like AWS, especially early AWS, CDNs, telecom, etc. I can speak to the telecom bit personally, since it's been over half of my professional career as an engineer and, in this specific case, also as a founder. You can have a brutally capital-intensive infra business where profitability depends on utilization, oversubscription, peak-capacity planning, segmentation, and recovering capex over time.

The math he presents gets even more questionable as we see explicit segmentation happening for cost-saving reasons. Many forward-thinking orgs are waking up to the fact that they don't need to use the best, most expensive model for every task. They can route easier tasks to cheaper models, use caching, batch non-urgent workloads, and reserve frontier models for the subset of work that actually needs frontier intelligence. That directly undermines his claim that providers always need to chase frontier intelligence in order to maintain current demand, utilization, and pricing curves.

I think he doesn't need to understand the technology to point out the books are cooked. a business can sink in either way: the technology flops or the finances flop. he's arguing the /finances/ would flop. he doesn't argue that the /technology/ would flop, only that they can't come up with the money to pay their debters.

  • There is a piece of this I agree with. That you do not need to be a deep technical expert to notice that a company is burning cash by overcommitting to capex, or relying on heroic revenue projections that may or may not come to pass.

    But that is not the full argument he is making. If the claim is that the labs will not be able to pay their creditors because inference is structurally incapable of becoming profitable, then he absolutely needs to be right about the technical economics of inference.

    One part of that is the balance-sheet argument (which already shows insanely good margins). But it also depends on how inference-time compute actually works: routing, batching, kv cache reuse, model segmentation, different latency tiers, etc. Much of those details he's just been straight up wrong about in his writing, so as a result I have to call into question the rest of his reasoning as well (in part to avoid Gell-Mann amnesia).

> That directly undermines his claim that providers always need to chase frontier intelligence in order to maintain current demand, utilization, and pricing curves.

But does it also not mean that they will make less money given that there is already brutal competition for that lower tier from openrouter, Deepseek, Amazon, etc.?

You can't on the one hand say "customers are beginning to understand they can spend less" and on the other hand suggest that this is good for forecasts of revenue.

> that he doesn't really understand how inference works from a technical perspective

Could you share what tells about it? I.e. where he was wrong about it?

  • There's examples both in his writing and also in his appearances on podcasts, interviews, etc.

    I'll cherry pick a couple:

    “When these new models ‘reason,’ they break a user’s input and break into component parts, then run inference on each one of those parts.” [1]

    This is not at all how test-time compute works. At best, this is a very loose metaphor that he may have used out of convenience. This might sound a bit pedantic to point out, but this is a very basic thing that he's getting wrong (presumably at least, again it could be that he just used a poor metaphor).

    A less pedantic example would be his claims related to gpt-5/chatgpt auto-routing. He argued that having a router means OpenAI can no longer cache static prompts, because the user prompt has to come before the hidden instructions [2]. This is just not at all how this works at inference-time. There is no evidence that the standard approach of system>developer>user instruction hierarchy has changed, the public API and caching docs maintain this.

    But even more broadly, it suggests he is reasoning about kv/prefix caching at the wrong level of abstraction. It's true that conventional prefix caching does require a stable prefix, so yes, if you literally put variable user content before the static prompt, you would destroy the cacheability of that static prompt.

    But that is exactly why inference systems are designed to preserve reusable prefixes where possible (via checkpointing or similar), and why serving systems care so much about prefix caching. This is also a big part of how disaggregated prefill/decode infra works where cache-aware routing is critical. His argument treats a bad prompt layout as if it were a necessary consequence of routing, rather than an avoidable implementation choice.

    A router can read the user request, decide which model path to use, and then construct a normal downstream model call with stable static instructions first and user content later. Treating that as impossible implies a fundamental architectural misunderstanding.

    [1] https://www.wheresyoured.at/how-to-argue-with-an-ai-booster/

    [2] https://www.wheresyoured.at/how-does-gpt-5-work/