← Back to context

Comment by embedding-shape

10 hours ago

> The whole thesis falls apart though. You can't be on your way to "power over everything" and get distilled into free Chinese models within months. Pick one.

But is that last part actually true though? Sure, there might be 600B+ models available for download and local inference if you have the hardware, but does the users who use Anthropic switch over to those even if they're available even as hosted models? Seems like some do, most don't, Anthropic and Claude remains very popular among the people who use LLMs, there is no denying that.

> does the users who use Anthropic switch over to those even if they're available even as hosted models?

I'm currently spending $200 for Claude. That's around my maximum that I can afford. I could stretch that to $500 I guess. But I saw reports of people spending tens of thousands of dollars with Claude API. That's certainly outside of my budget.

So if/when Anthropic decides to stop subsidizing subscription (if they ever do that thing, I still not sure about that), I'll certainly look at the other options. And available "open weights" LLMs hosted by someone will be my first pick. Right now Claude 4.8 feels very advanced, but things move very fast...

  • The ai labs would be very dumb to get rid of subscriptions. First, I don’t even think the subscriptions are losing money, I suspect they’re around break even, maybe small loses. More importantly, the subscriptions are how they lock in users and convince companies to pay api rates. Without user loyalty that they cultivate with subscriptions businesses will just use the cheapest model on open router or maybe local models.

    • > I don’t even think the subscriptions are losing money, I suspect they’re around break even, maybe small loses

      whats the basis for this thought

      4 replies →

The hotness we are seeing is smaller 'expert' models with an 'orchestrator' model in front that evaulates the prompts and routes to the appropiate small models and then synthesizes the collected answer. Easier to split across many smaller, cheaper servers and more efficient than a huge monolithic model.

  • Do you have more info about this? I can't tell if you're being misled by the unfortunate "Mixture of Experts" terminology (which don't work the way you're describing), or alluding to something different.

    Or, maybe I'm wrong, but my understanding is: MoE is just an architecture to keep the activated weights smaller per token. The experts get routed basically token-by-token, and the "experts" themselves don't have a semantic domain so the "expert" word was maybe a poor choice.

    • No, this is an agent-level thing, not a feature of the model (ish, unsure for Fable).

      You talk to a smart, heavy model to build a plan composed of smaller steps. Then you have the heavy model spin up smaller, cheaper LLMs to actually implement the tasks.

      The heavy model is basically read-only in that mode. It can read files, execute tests, etc, but it can’t write code. It just tracks what needs to be done, offloads the work to dumber LLMs, validates the task is done, and moves on to the next step.

      It sort of pushes humans up the stack. Instead of having a human sitting there prompting the LLM to start the next task, you have another LLM do that loop.

      It’s been on my list to try out.

    • https://en.wikipedia.org/wiki/Mixture_of_experts#Sparsely-ga...

      "The sparsely-gated MoE layer,[21] published by researchers from Google Brain, uses feedforward networks as experts, and linear-softmax gating. Similar to the previously proposed hard MoE, they achieve sparsity by a weighted sum of only the top-k experts, instead of the weighted sum of all of them."

      "Top-k experts," in case of some DeepSeek's models k=1.

People dont pivot on a dime. If there stopped being major model improvements for a few years and equivalent free models have been out during the same period, we will see people slowly move over to competitors.

> Anthropic and Claude remains very popular among the people who use LLMs

Only because someone else is paying the bills. I use Claude Opus at work because my employer pays for the tokens and encourages me to do it.

At home, I use DeepSeek Flash. It's not as good, but it's maybe 0.7 quality for 0.001 cost.

  • Same, I had Deepseek search for, download and transfer (to my Linux emulation machine) the best Dreamcast games yesterday.

    GPT refused to do so (citing that it's illegal even though I own the games). Deepseek did a wonderful job for 7 cents.

    At work I use Opus because, why not? But I could easily switch to a less capable model if needed.

    • >citing that it's illegal even though I own the games

      In the. US at least it is actually illegal to download ISOs/roms of games, even if you own a physical copy. It's a stupid law and as a downloader (as opposed to the people hosting the files) your chances of getting into any kind of actual legal trouble are effectively 0, but it is still against the law.

      3 replies →

  • I have a question that perhaps you or someone else here has an answer for: I enjoy using Opus via Google Antigravity (usually agy) for perhaps 90 minutes a week. For Google’s subsidized $20/month plan they seem to give out a reasonably generous amount of Claude tokens. How does this compare with Anthropic’s $20/month plan using Claude Code?

    BTW, I also use DeepSeek v4 Flash very frequently: fast and so cheap it is almost free.

    • It’s really hard to translate minutes to tokens, it depends on how you’re using it.

      The best answer would be to pull session stats from your harness and compare that against the limits. I think Anthropic publishes the limits of each plan.

      If you’re using a pretty stock harness and not doing crazy multi-agent stuff with it, you’re probably fine.

      My girlfriend built a whole (but simple) React app with it and only hit the limits of the $20 plan once. In fairness, she was trying to get it to clean up a bunch of 800ish line React files at once with a vague “make it look nice” prompt that she ran a few times. I think it was just churning for like half an hour straight before she burned all her credits.

      It’s probably enough if you’re not on a fully agentic development strategy, it’s plenty to have it write tests and do comments and stuff, just not enough to continually have it doing giant refactoring passes.

    • Anthropic's plans are based on user experience of usage, not raw token counts, so you get to run through so many conversation turns, etc. within a 5 hour usage window. (Cursor, OpenCode Go, and others are similar.)

      Cursor's $20 a month plan provides a reasonable amount of Opus tokens as well.

I don't think you're appropriately understanding the full gamut. The individuals who only spent $200/months will be stuck. But the pie is increasing in size, it's not stagnant. There are a lot of orgs who can afford to run a 1T model and even more that can run a 600B model. These newcomers are what's being fought over