← Back to context

Comment by stkdump

11 days ago

It's interesting how quickly people buy the "abuse" line of thinking. We understood (and knew for a long time) that the large AI labs are not monetarily profiting from subscription users that make heavy use of their subscription. That is independent of which agent/harness is used. The fair/real price for profitable use is the pay per use token pricing.

These labs play the game of trying to kill competition in the harness game (because third party harnesses risk commoditizing the underlying LLMs once they are all good enough), while playing a game of chicken with each other how long they can burn money that way before they have to give up.

At some point they have to price their product fairly, and the only hope they have is to have killed all competition by then, which is of course a game that they seem to be loosing. Useful models are getting smaller and cheaper to run every year and it has hit a threshold at which we will see continued development of third party harnesses even without the userbase of subscription users.

Basically the prime bet that they made (that one needs extremely expensive hardware to have useful AI) has already failed. The secondary bet that they can lock users into their ecosystem (which requires them to subsidize their harness via unprofitable subscriptions burning their capital) and be able to monetize that later will also fail. They will have to compete on merit alone, and that is much less profitable.

It's a big leap to go from "some users may be using large quantities of tokens" to "the labs are burning money on subs in an attempt to kill the competition."

Lots of businesses have subscription programs in which a small number of users are money losers, but which in aggregate make money.

It's not even obvious that the labs are losing a lot of money on even a minority of users; the rate use caps are fairly aggressive for Anthropic, and a cursory analysis of likely actual cost of serving tokens shows they are high margin products at the API level and unlikely to be unprofitable within the usage constraints provided to subscribers.

I do think subscription models make commercial sense because users want predictable costs, and it's a club good in which marginal token cost is zero which helps consolidate their customers' purchasing volume to one provider. But that's a different claim than them serving it unprofitably to kill competition.

Also, they (Anthropic) are transitioning many of their enterprise customers to API consumption billing anyway.

  • I work in the video AI world.

    We gave up on subscriptions long ago. They're rinky dink and get you a paltry amount of utilization before they run out.

    The per day per seat costs can exceed $1000. This is already normal for studios, and it's already producing positive ROI.

    There's simply no way to price video any other way than by usage. I suspect the same will come for everything.

    • > There's simply no way to price video any other way than by usage. I suspect the same will come for everything.

      I don't think there's any way for all of the current AI models to work except as a usage model. The question is whether or not people are willing to pay for it that way in the long-term.

      It sounds like it is producing positive ROI for your side, but I’m curious what the bean counters at the studios think of the bill when the budgets tighten.

> Basically the prime bet that they made (that one needs extremely expensive hardware to have useful AI) has already failed.

I thought the prime bet was that the winning lab who reaches takeoff through recursive self improvement will make a galactic superintelligence. Not saying I believe this but the people running the labs do. Under this scenario if you are a few months behind at the pivotal time you might as well not exist at all.

  • only if said galactic superintelligence takes immediate steps to kill all its potential competitors, or hoover up all the world's resources, or some other aggressively zero sum thing. otherwise I don't see what difference it makes down the line of you have the second superintelligence rather than the first.

    and that's under the assumption that you can create a superintelligence that will continue to slavishly serve your agenda rather than establishing and following its own goals.

    • This is also assuming that AGI is even possible. So far there is no evidence that this is actually doable over anything but billions of years (and even then we have no idea how nature really managed it).

      Edit: Meant to say AGI (superintelligence didn't make sense). Superintelligence is undefinable at the moment so even considering if it's possible or not is more of a philosophical thing/si-fi thought experiment than anything else.

      13 replies →

    • One could argue that AI has already started to hoover up all the world’s resources. AI buildout as a percent of GDP is already high and still rising.

      2 replies →

    • Anthropic/OpenAI aren't planning to have their superintelligence take over the world, but they're still afraid that someone else will do it.

    • Well no because no one is going to be coming in to work building the next AI model after the Singularity.

      We’ll all be bblbrvkxn46?/4!gfbxf’mgv5fhxtgcsgjcucz to buvtcibycuvinovrYdyvuctYcrzuvhxh gcuch7…:!

  • I don't think this race to superintelligence idea should be taken too seriously. It is great for headlines and get peoples imaginations up. It is mostly a marketing gag.

    I look at superintelligence this way: software engineering used to be considered amoung the most mentally demanding jobs one can have. And in this field more and more people give up large parts of their job and become approximately product managers to let the machine do the engineering part. So we are about there. Who cares that there are some puzzles in some "synthetic" benchmark in which humans outsmart AIs?

    • The people in that community have been talking about superintelligence for decades and it’s part of an ideology. It’s not some recently-invented story for headlines.

  • One thing I don’t understand about this viewpoint (which I understand isn’t your own): why does one benefit so tremendously from getting there a month before competitors? I’m sure having a month of superintelligence with no competition would be lucrative, but do they think achieving superintelligence first will impede competitors from also achieving it a month later?

    • A week of superintelligence should be enough to take over the world, or at least sabotage your competitors. And even if someone else gets there a week later, they'll be permanently one week behind the curve (until the AI hits some physical limit, I suppose).

      But that's all just sci-fi worldbuilding.

      1 reply →

    • A month with a superintelligence at your hands could be quite impactful, especially if you're willing to break the law / normal operating decorum in the pursuit of protecting what you have. A superintelligence, if wielded so, could destroy your competitors in a great many ways, including the relatively-benign solution of outcompeting them, to exploiting them and tearing them apart from the inside.

      A genuine superintelligence is a very, very scary thing to have under the control of one person or organisation.

      6 replies →

    • Assuming it can't super hack all computer systems and cripple competing SI incubation to at least increase its lead time indefinetly.

      The assumption would be that in the lead time it has the super intelligence at least takes a small lead and undermines any paths a later arriving super intelligence could take to interfere with it's goals, which naturally includes stopping competing SIs from becoming more powerful in a way that could undermine it.

      So assuming the super intelligence has goals and work towards them it will be initially trying to solidify its own power, iterating on that small lead, assuming it's the smartest super intelligence[1], should be enough to win. The scary part is that assuming no guardrails [2] it's going to be as ruthless as possible in achieving those goals. That does not necessarily mean it will appear ruthless in achieving those goals, just as ruthless as it judges optimal.

      1. Which being so smart one of it's chores would have been reinvestment in making itself smarter than competition and being smarter than its makers has a good chance of actuating those self-improvements.

      2. In the internal balancing of goals sense not the don't feed the mogwai after midnight sense.

    • It's a tenet of the eschatology from the singularity ideology that was developed on online forums over the last few decades.

      The viewpoint is baked into those assumptions and boils down to the power of exponentials and poor application of game theory.

> We understood (and knew for a long time) that the large AI labs are not monetarily profiting from subscription users that make heavy use of their subscription.

I dont think this is "understood" or "known" to anyone except Ed Zitron. Subscription plans like Claude Code also have rolling usage limits, it could be profitable. Inference is very cheap and unless you're using OpenClaw no one is actually maxing out the usage window at all times. I'm sure in aggregate the subs are not money furnaces.

  • Then explain why they started banning all third party harnesses, including those that work through Claude Code, if it still makes them money. They are cutting off profit for no good reason?

    I think there were reasons to doubt that heavy subscription users are unprofitable before they did that. OpenClaw was just the tip of the iceberg.

    Why don't they make token pricing dynamic if that was the case? It should then allow heavy user to get even more for their money than with the current subscription model where they can't adjust to current infra availability.

    It may be that "in aggregate" sub users are (not yet) a loosing business. But in all fairness, the more useful AI gets, the more it will be used. And the more it will be used, the harder it will be to make subs cheaper than token pricing. The only counter-weight are new light users, but those will also become heavy users over time, the more useful it will be for them. And at some point it will be hard to onboard light users in the first place, because the laggards will require even more intelligence and value to get them over.

    • If each additional user is a net benefit for them, but they're still struggling to find enough capacity, it would make sense to cut down usage from existing users so they can onboard new ones.

      They're trying to capture the market! Can't do that if you have to stop onboarding users because NVIDIA are struggling to manufacture enough GPUs for you.

> We understood (and knew for a long time) that the large AI labs are not monetarily profiting from subscription users that make heavy use of their subscription.

"profit" is a weird concept in the software business. it might be true that there is an opportunity cost to these users, either because they displace other potential users by using up capacity, or because they would be willing to pay more if forced. but I don't believe that anyone is losing money on inference costs on any of their plans.

> At some point they have to price their product fairly

they are competing in a market. if most of their costs were inference then this would be a good thing, because everyone would have roughly the same prices, so as long as they had the best model they would win. in fact model development costs eclipse the cost of inference, and is something that non frontier labs get for much cheaper by distilling from the frontier companies.

> They will have to compete on merit alone, and that is much less profitable.

that's not really true. google won search on merit alone, and were massively successful as a result. the trick is that everyone from the poorest shmuck to the richest businessman uses google, so they win through scale. in ai, google and openai are making a bet that they can do the same thing. there's only really room for one winner at this game, even two is stretching it, so anthropic has to win by being the smartest model that only high end businesses use. that's a very risky bet.

> Useful models are getting smaller and cheaper to run every year and it has hit a threshold at which we will see continued development of third party harnesses even without the userbase of subscription users.

As of May 2026, how much money do I need to spend to buy hardware to have a local model that is 80% as good as SOTA services for assisting me in writing code?

As for that 80%, how many minutes per LOC will I be waiting, and how many attempts per query will I be wasting while I wait for it to come up with something sensible?

  • > As of May 2026, how much money do I need to spend to buy hardware to have a local model that is 80% as good as SOTA services for assisting me in writing code?

    https://llm-stats.com/benchmarks/swe-bench-verified

    SOTA (public proprietary models) would be Opus 4.7 at 0.876

    80% of that would be around 0.7.

    These models qualify, and are upwards of 90% as good in benchmarks:

      DeepSeek-V4-Pro-Max - 1.6T (HuggingFace shows 862B, huh) - 0.806
      Kimi K2.6 - 1.1T - 0.802
      MiniMax M2.5 - 229B - 0.802
      DeepSeek-V4-Flash-Max - 284B (HuggingFace shows 158B as well) - 0.790
    

    These are 80-90% as good, which is also where you see the smaller ones:

      GLM-5 - 754B - 0.778
      Qwen3.6-27B - 27B - 0.772
      Kimi K2.5 - 1.1T - 0.768
      Qwen3.5-397B-A17B - 397B - 0.764
      Step-3.5-Flash - 199B - 0.744
      GLM-4.7 - 358B - 0.738
      MiMo-V2-Flash - 310B - 0.734
      Qwen3.6-35B-A3B - 35B - 0.734
      DeepSeek-V3.2 - 685B - 0.731
      DeepSeek-V3.2-Speciale - 685B - 0.731
      DeepSeek-V3.2 (Thinking) - 685B - 0.731
      Qwen3.5-27B - 27B - 0.724
      Qwen3.5-122B-A10B - 125B - 0.720
      Kimi K2-Thinking-0905 - 1T - 0.713
      LongCat-Flash-Thinking-2601 - 562B - 0.700
    

    Out of those, the most modest one you could get is Qwen3.6-35B-A3B because the MoE nature makes it faster across more varied hardware.

    I currently run the Unsloth 8bit quants on-prem (on a bunch of Nvidia L4 GPUs, since low TDP, long story), some people swear by more quantized versions but with the small models the impact is felt more: https://huggingface.co/unsloth/Qwen3.6-35B-A3B-GGUF

    So essentially you need up to 39 GB for the model itself and then some for the KV cache and whatever context size you want. Ideally I'd aim for 64 GB of memory for that, though if really pressed for resources, could get a heavily quantized version within 32 GB (but very little memory for context and kinda shit).

    Personally, I think that you need about 45-60 tokens/second for decent usability - even comparatively modest hardware (including those L4) can run the model, though on the lower end options you will not be running parallel sub-agents etc.

    Some random results for when you don't want a traditional multi-GPU setup:

      Mac Mini - about 1999 USD, gets you somewhere upwards of 30 tokens/second (depends on quantization and how you run it)
      Framework Desktop - about 2500 USD, gets you somewhere upwards of 25 tokens/second https://community.frame.work/t/framework-desktop-for-local-ai/80880/5
      DGX Spark - about 3500 USD, gets you somewhere upwards of 50 tokens/second https://forums.developer.nvidia.com/t/qwen-qwen3-6-35b-a3b-and-fp8-has-landed/366822/27
    

    Some random results from pulling up random shops and approx. benchmarks, for dual GPU setups (not necessarily NVLink etc.):

      2x Intel Arc Pro B70 - about 1900 USD, gets you around 36 tokens/second, borderline usable, I blame their software stack
      2x Radeon AI PRO R9700 - about 3000 USD, gets you somewhere upwards of 60 tokens/second, usable
      2x Radeon PRO W7800 - about 5400 USD, same as above
      2x NVIDIA RTX 5090 - about 7600 USD, same as above
      2x NVIDIA RTX 5000 Ada - about 9200 USD, same as above
    

    Of course, for those models, some of those cards are way overkill, but you definitely can get something for running local models without too many compromises involved. That said, you definitely will get a worse experience than SOTA cloud models at that 80% and will have to rework stuff quite a bit often, as my own experience with the Qwen model shows - okay for simple tasks, breaks down on complex stuff. For that, you'd want at least some of the 90% category models and would probably need to consider how much memory you can realistically get.

    At least it's not hopeless!

>Basically the prime bet that they made (that one needs extremely expensive hardware to have useful AI) has already failed.

Honestly, I don't think it's that cut and dry. Their bet is that the marginal utility of having a smarter model more than makes up for the cost of the additional high-end hardware.

And honestly, if you look at their frankly insane revenue growth since Opus 4.5 released, they were right.

>The secondary bet that they can lock users into their ecosystem (which requires them to subsidize their harness via unprofitable subscriptions burning their capital) and be able to monetize that later will also fail.

I think we're already past this point, honestly. They lowered usage limits, blocked OpenClaw then tried to remove Claude Code from the $20/mo plan. They have always had low market share for the consumer chatbot market and don't seem to care about catching up to OpenAI there.

What about the data they are accumulating, for non-training purposes? That data isn't of negligible value; the "subscription cost" is really a "harvesting data" opportunity. Don't be naive to that our data is not incredibly valuable.

> These labs play the game of trying to kill competition in the harness game

Anthropic and Google are arguably playing that game. OpenAI's Codex CLI is open source and entirely optional for use of the GPT Codex models.

  • OpenAI just has more runway and has convinced its investors that it is as much about hardware (stargate) as it is about anything else. So they think they can/have to afford keeping the software side more open to not make themselves look stupid. Google is more of a down to earth company with other business to lose and isn't bought into it as much.

If you were right Anthropic's ARR would be going down but it's not. They just surpassed $30B up from $14B two months ago.