Comment by regularfry
4 days ago
> Except they can't. Their costs are not magically lower when you use claude code vs when you use a third-party client.
I don't have a dog in this fight but is this actually true? If you're using Claude Code they can know that whatever client-side model selection they put into it is active. So if they can get away with routing 80% of the requests to Haiku and only route to Opus for the requests that really need it, that does give them a cost model where they can rely on lower costs than if a third-party client just routes to Opus for everything. Even if they aren't doing that sort of thing now, it would be understandable if they wanted to.
It (CC) does have a /models command, you can still decide to route everything to Opus if you just want to burn tokens I guess it's not default so most wouldn't, but still, people willing to go to a third party client are more likely that kind of power user anyway
They still have the total consumption under their control (*bar prompt caching and other specific optimizations) where in the past they even had different quotas per model, it shouldn't cost them more money, just be a worse/different service I guess
> it shouldn't cost them more money
As things are currently, better models mean bigger models that take more storage+RAM+CPU, or just spend more time processing a request. All this translates to higher costs, and may be mitigated by particular configs triggered by knowledge that a given client, providing particular guarantees, is on the other side.
That’s kind of the point. Even if users can choose which model to use (and apparently the default is the largest one), they could still say (For roughly the same cost): your Opus quota is X, your Haiku quota is Y, go ham. We’ll throttle you when you hit the limit.
1 reply →
> It (CC) does have a /models command, you can still decide to route everything to Opus if you just want to burn tokens I guess it's not default so most wouldn't
Opus is claude code's default model as of sometime recently (around Opus 4.6?)
That’s not how Claude Code works. It’s not like a web chatbot with a layer that routes based on complexity of request.
You don't control what happens when a request hits their endpoint though.