Comment by aanet
18 hours ago
^ THIS
I've run out of quota on my Pro plan so many times in the past 2-3 weeks. This seems to be a recent occurrence. And I'm not even that active. Just one project, execute in Plan > Develop > Test mode, just one terminal. That's it. I keep getting a quota reset every few hours.
What's happening @Anthropic ?? Anybody here who can answer??
[BUG] Instantly hitting usage limits with Max subscription: https://github.com/anthropics/claude-code/issues/16157
It's the most commented issue on their GitHub and it's basically ignored by Anthropic. Title mentions Max, but commenters report it for other plans too.
It's not a bug it's a feature (for Anthropic).
Its not a bug, it's a poorly defined business model!
“After creating a new account, I can confirm the quota drains 2.5x–3x slower. So basically Max (5x) on an older accounts is almost like Pro on a new one in terms of quota. Pretty blatant rug pull tbh.”
lol
Your quota also seems to be higher after unsubscribing and resubscribing?
2 replies →
This whole API vs plan looks weird to me. Why not force everyone to use API? You pay for what you use, it's very simple. API should be the most honest way to monetize, right?
This fixed subscription plan with some hardly specified quotas looks like they want to extract extra money from these users who pay $200 and don't use that value, at the same time preventing other users from going over $200. Like I understand that it might work at scale, but just feels a bit not fair to everyone?
Not a doctor or anything, but API usage seems to support the more on-demand / spiky workflows available at a much larger scale, whereas a single seat, authenticated to Claude Code has controlled / set capacity and is generally more predictable and as a result easier to price?
API request method might have no cap, but they do cap Claude Code even on Max licenses, so easier to throttle as well if needed to control costs. Seems straightforward to me at any rate. Kinda like reserved instance vs. spot pricing models?
You're welcome to use the API, it asks you to do that when you run out of quota on your Pro plan. The next thing you find out is how expensive using the API is. More honest, perhaps, but you definitely will be paying for that.
I tried the API once. Burned 7 dollars in 15 minutes.
Consumers like predictable billing more than they care about getting the most bang for their buck and beancounters like sticky recurring revenue streams more than they care about maximizing the profit margins for every user.
I just like beong able to make like $250 of API calls for $20.
3 replies →
The fixed fee plan is because the agent and the tools have internal choices/planning about cost. If you simply pay for API the only feedback to them that they are being too costly is for you to stop.
If you look at tool calls like MCP and what not you can see it gets ridiculous. Even though it's small for example calling pal MCP from the prompt is still burning tokens afaik. This is "nobody's" fault in this case really but you can see how the incentives are and we all need to think how to make this entire space more usable.
I very recently (~ 1 week ago) subscribed to the Pro plan and was indeed surprised by how fast I reached my quota compared to say Codex with similar subscription tier. The UX is generally really cool with Claude Code, which left me with a bit of a bittersweet feeling of not even being able to truly explore all the possibilities since after just making basic planning and code changes I am already out of quota for experimenting with various ways of using subagents, testing background stuff etc.
The best thing about the max plan has been that I don’t have “range anxiety” with my workflows. This opens me to trying random things on a whim and explore the outer limits of the LLM capabilities more.
I remember a couple of weeks ago when people raved about Claude Code I got a feeling like there's no way this is sustainable, they must be using tokens like crazy if used as described. Guess Anthropic did the math as well and now we're here.
I use opencode with codex after all the shenanigans from anthropic recently. You might want to give that a shot!
Use cliproxyapi and use any model in CC. I use Codex models in CC and it's the best of both worlds!
Like a good dealer, they gave you a cheap/free hit and now you want more. This time you're gonna have to pay.
I've been hitting the limit a lot lately as well. The worst part is I try to compact things and check my limits using the / commands and can't make heads or tails how much I actually have left. It's not clear at all.
I've been using CC until I run out of credits and then switch to Cursor (my employer pays for both). I prefer Claude but I never hit any limits in Cursor.
Hmm, are you using the /usage command? There’s also the ccusage package that I find useful.
Thanks. I don't know why but I just I couldn't find that command. I spent so much time trying to understand what /context and other commands were showing me I got lost in that noise.
> I've run out of quota on my Pro plan so many times in the past 2-3 weeks.
Waiting for Anthropic to somehow blame this on users again. "We investigated, turns out the reason was users used it too much".
sounds like the "thinking tokens" are a mechanism to extract more money from users?
Anecdotally but it definitely feels like in the last couple weeks CC tends to be more aggressive at pulling in significantly larger chunks of an existing code base - even for some simple queries I'll see it easily ramp up to 50-60k token usage.
This really speaks to the need to separate the LLM you use and the coding tool that uses it. LLM makers utilizing the SaaS model make money on the tokens you spend whether or not they need them. Tools like aider and opencode (each in their own way) use separate tools build a map of the codebase that they can use to work with code using fewer tokens. When I see posts like this I start to understand why Anthropic now blocks opencode.
We're about to get Claude Code for work and I'm sad about it. There are more efficient ways to do the job.
4 replies →
I'm curious if anyone has logged the number of thinking tokens over time. My implication was the "thinking/reasoning" modes are a way for LLM providers to put their thumb on the scale for how much the service costs.
they get to see (if not opted-out) your context, idea, source code, etc. and in return you give them $220 and they give you back "out of tokens"
4 replies →
It's absolutely a work-around in part, but use sub-agents, have the top level pass in the data, and limit the tool use for the sub-agent (the front matter can specify allowed tools) so it can't read more.
(And once you've done that, also consider whether a given task can be achieved with a dumber model - I've had good luck switching some of my sub-agents to Haiku).
> more aggressive at pulling in significantly larger chunks of an existing code base
They need more training data, and with people moving on to OpenCode/Codex, they wanna extract as much data from their current users as possible.
Their system prompt + MCP is more of the culprit here. 16 tools, sophisticated parameters, you're looking at 24K tokens minimum
probably, because they recently said the ultrathink is enabled by default now.
does this translate into "the end-user's cost goes up"
by default?
Its the clanker version of the "Check Wallet Light" (check engine light).
How quickly do you also hit compaction when running? Also, if you open a new CC instance and run /context, what does it show for tools/memories/skills %age? And that's before we look at what you're actually doing. CC will add context to each prompt it thinks is necessary. So if you've got a few number of large files, (vs a large number of smaller files), at some level that'll contribute to the problem as well.
Quota's basically a count of tokens, so if a new CC session starts with that relatively full, that could explain what's going on. Also, what language is this project in? If it's something noisy that uses up many tokens fast, even if you're using agents to preserve the context window in the main CC, those tokens still count against your quota so you'd still be hitting it awkwardly fast.