Comment by abuani
2 days ago
I take a peak every month or so at spend for my company and notice more and more are consumed $1k in tokens a month and it is bewildering to me how. I use llms daily, and see anywhere from $200-$400 tops. This is using the most expensive models, in deep thinking mode. So I'm not a Luddite against the usage of them. I just can't figure how _how_ to burn that much money a month responsibly.
I genuinely challenge someone spending $5-$10k a month to demonstrate how that turns into $50-$100k in value. At a corporate level, I'd much rather hire a junior engineer who spends $100-$200/month and becomes productive then try and rationalize $100k/year in token spend.
> I just can't figure how _how_ to burn that much money a month responsibly.
From my experience, this happens essentially by three means:
- Level 0 (beginner users) long lived conversations: If you dont get in the habit of compressing, or otherwise manually forcing the model to summarize/checkpoint its work, you will often find people perpetually reusing the same conversation. This is especially true for _beginners_, which did not spend time curating their _base_ agent knowledge. They end up with a single meta conversation with huge context where they feel the agent is "educated", and feel like any new conversation with the agent is a loss of time because they have to re-educate it.
- Level 1 (intermediate users) heavy explicit use of subagents: Once you discover the prompt pattern of "spawn 5 subagents to analyze your solution, each analyzing a different angle, summarize their findings", it can become addictive. It's not a bad habit per se, but if you're not careful it can drastically overspend your credits.
Level 3 (expert users) extreme multitasking. Just genuinely having 10 worktrees perpetually in parallel and cycling between them in between agent responses. Again, not necessarily bad in itself, but can exponentially conse credits.
> Just genuinely having 10 worktrees perpetually in parallel and cycling between them in between agent responses. Again, not necessarily bad in itself, but can exponentially conse credits.
I'm pretty sure that growth is linear.
If you think about it, the production quality is probably log-linear, so the token growth may well be exponential.
Not quite the same scenario, but it's already plausible to have a situation where every subagent is allowed to spawn multiple subagents, in which case we'd have literally exponential credit consumption growth...
1 reply →
I think that you send the entire conversation with every request.
3 replies →
I’ve seen another pattern, I call it “The Document Mongerer”:
I regularly work in a largish monolith. We have micro services too, but most things are in the monolith. Over the years there have been multiple pushes to split it up into micro services. These efforts invariably fail because the _goal_ is the micro service architecture itself instead of something useful to the company, like the ability to do fast releases or better organized code.
Anyways, in the past few months I’ve seen multiple people individually ‘attack’ this insane goal with AI. The first step is always to generate massive amounts of documentation describing the current state of code and proposing areas to split up. Then, after the engineer generates this huge store of documents, they say ‘looked what I created’ and then drop it and move on to some other shiny toy. No one will ever read these documents. They are out of date before they ever get ‘completed’, their sole usage is to waste credits.
Missing here: some organizations were rewarding high token usage as productivity without critical evaluation. People were afraid to be in the bottom because outcomes weren't being measured.
It is a giant Goodhart's law lesson
Give your agent a perfectly working code, insist that the output is not what it should be. Go to lunch. By the time you come back, the poor thing will evaporate a small lake trying to figure it out.
1 reply →
What!? Companies rewarding high token usage? That's inane, insane, and small brained. Who in their right mind equivocates spending more money to bring more productive. I'll just set up some burn jobs to kill tokens unnecessarily and then everyone else will too and the company will go bankrupt in 10 days. It seems inconceivable for a company to set up a "who can spend the most of our money" leaderboard for any other context
7 replies →
Totally agree!
Bonus level "I have a hammer, all I see is nails": using Claude Code for random non-coding work, like dataset cleaning. It's really convenient to have a script spawning Haikus via `claude` CLI and feeding them prompts and JSON files. Money burn potential: practically unbounded, but also it's real work that the product people wanted done, so of course it has a cost associated with it. I'd be bewildered if anyone complained.
Where is level 2?
It’s probably unary interpreted as binary, hence there is no level 2. Level 3 is followed by level 7. Level n is followed by level 2n + 1. Exponential growth. The singularity is near.
1 reply →
Still waiting for the output of that agent
LLMs can't count well.
There isn't one. Level 3 is just that much more advanced.
We dropped it before level 3 was released.
level 99 - They're using Gas Town
I’m basically doing lvl 3. There’s not a single port in my local worktree’s .env that’s not guaranteed to be unique across all worktrees. Skills for agent to start their own managed dev server, launch their own isolated instance of chrome etc. literally end-to-end code and debug the entire app. I do have to say though you have to know the app inside out and have to have a pretty well groomed backlog in order to run them all in parallel and actually benefit from it.
Would love to learn more on how you do it. The various skills, tasks, workflow. If you have time and can share it. That would be valuable. :)
as a new user of agents, i am realizing i'm using a strategy basically identical to level 0. is the typical approach to just make a CLAUDE.md/AGENTS.md and start a new thread for each task or is it more complicated than that?
No, it's not. Your context should be SMALL.
https://www.youtube.com/watch?v=-QFHIoCo-Ko
I spend about $3k/month (subsidized by the Claude Max plan).
I guess I fall under level 3 (2?): I typically have 3-6 agents working simultaneously on the same feature, they each make worktrees, code, run tests and put up PR’s. I also have Github actions which scan for regressions and security issues on each PR.
It makes my development cycle extremely fast: I request a feature and just look at Github and look for changes to my human readable outputs, settle on a PR, merge, repeat.
The issue is that I am now the bottleneck in my system. I find myself working basically non-stop, because there is always more to do. (Yes I know I can automate the acceptance criteria but that turns to slop real fast)
So LLMs produce PR for you, and you quickly merge them? Does anyone besides you having even a little look at them?
It's interesting how quick automating others out of a job turna to automating ourselves out instead
1 reply →
How do you compress or otherwise force a model to checkpoint?
What about Level 2?
>> Again, not necessarily bad in itself,
yeah, it is bad. Human brain is not able to properly assess this amount of changes. To understand even a small change you need a lot of capacity. To understand thousands of lines - impossible.
This is pure slop pouring into prod and we can see more and more consequences of this in all big corps's products - things start to break more and more exponentially faster.
The thing I keep coming back to is - does it matter?
Really does it matter if a company produces something that breaks constantly or gets worse or slower.(See github) Megacorps have a wide moat and have forced out all competition or they just buy them with low interest loans.
The quality of products keeps getting worse and we can do nothing but live with it. So if that's the state of the world, why wouldn't you just push as many "features" as fast as possible. More is rewarded. Less is punished. Quality does not matter.
3 replies →
First: There's the obvious "If the company is letting me do it, I'll be wasteful." This includes not clearing/compacting the context often. Opus now has a 1M context window, and quality is good to at least 200K. So each query is burning a lot of tokens until you clear/compact.
People have already mentioned the size/complexity of the codebase. I'm new to my team and the codebase isn't huge, but it's large enough that there are plenty of parts I have little understanding about. When I'm given a task, then yes, I definitely go to Claude and ask it to find the relevant parts of code so I can understand the existing workflow before even attempting to change it.
The downside is that I don't build expertise. But the reality is that with Claude, I can get the work done in 1 day that would take me 5 days of struggling, and if everyone is doing it, I can't be left behind. So I take the middle route - I get it done in 2-3 days instead of 1 so I can at least spend some time with the code.
Especially with AI, the rate at which code changes in our codebase is insane. So I built a tool that takes a pull request, and tells the LLM to go deep and explain to me what that pull request does. (Note: I'm not the reviewer, I just want to keep tabs on the work that is going on in the team).
And this is just the beginning. I haven't actually spent time to come up with more ways to use the LLM to help me.
My usage is similar to yours, but if I were fairly experienced with the code base, I'd do a lot more. I haven't asked, but I suspect there are people in my team who go over $1K/month.
As always, the bottleneck is proper testing and reviews.
Edit: I'll also add that for not-so-important code used within the company, I suspect most people are going full-AI with it. For my personal (non-work) code, I just let the AI code it all - the risk is usually very low (and problems are caught quickly). If someone is using the "superpowers" skill, then even for basic features you can burn lots of tokens. I usually start with 20-40K tokens and end up with 80-90K tokens when it's finished. Which means that many of the requests prior to completion were sending in close to 80K tokens. Multiply that with the number of queries, etc.
Wasteful, but if someone else is paying ...
> This includes not clearing/compacting the context often. Opus now has a 1M context window, and quality is good to at least 200K. So each query is burning a lot of tokens until you clear/compact.
I see this repeated by others, including coworkers. It completely ignores caching. Caching itself is complicated, but the "longer context window = more expensive" is not 100% true and you are hampering yourself if you're not taking full advantage of large context windows.
You still pay for cache hits and refreshes, but the cost is lower.
The default Claude cache expires in 5 minutes. If you take a short break to review the code, talk to someone, or do anything other than continuously interact with the session it's going to get evicted and start over.
You can opt in to a 1-hour cache at a higher rate https://platform.claude.com/docs/en/build-with-claude/prompt...
Also anecdotally, caching has just been broken at times for me. I've had active conversations where turns less than 5 minutes apart were consuming so much quota that I doubt anything was being billed at the cache rate.
If you look at the actual cost of your Claude Code conversations, you'll see that the cost is overwhelmingly dominated by the cost of input tokens (cached). Because of how we construct persistent conversations, each cached input token incurs cost on each API request, meaning that component of cost scales with O(request count). If you graph the cost curve of a claude code session, it's very obvious that this scaling factor overwhelms the cache discount.
Here is a blog post that shows some data - https://blog.exe.dev/expensively-quadratic. And I can confirm this is true for Claude Code - I set up a MITM capture for all Claude Code requests and graphed it.
So increasing Request Count that reuses the same prefix (which is what higher compaction thresholds do) really does lead to (substantially) higher API costs.
Caching is pretty simple. If it's a prefix match, it's cacheable. Very long context windows will be much more expensive than shorter ones, even with caching, assuming you're using Claude Code or some similar harness for both. You'll get caching in both, but you'll pay more for the longer context. The cost of occasional compaction is more or less negligible compared to the massive cost of the input tokens that are getting charged repeatedly for every single request.
If you have 500k context, three turns will burn ~1.5MM tokens. If you have 250k context, three turns will burn ~750k tokens. If you have 125k context, three turns will burn ~375k tokens. Claude can at most generate 32k output tokens per turn in Claude Code (and it rarely does so), so despite the higher price of output tokens, almost all costs are dominated by input token costs. Even at cached input prices, cost scales near-linearly with context length: if you 2x your context length, you'll roughly ~2x your cost.
Now, it might be the case that longer context windows allow Claude to complete the task better — although I'd be surprised if there were many tasks requiring >200k tokens just to get the job done (that's nearly ten full copies of Shakespeare's "A Midsummer Night's Dream"). And they're definitely convenient, in the sense that you don't need to think about context management as much and worry about a sudden, unexpected autocompact wrecking things if you aren't carefully manually compacting at logical points. But they're definitely more expensive on a near-linear basis and you're paying for that convenience.
It’s crazy that people don’t understand cached tokens despite them being priced separately on the cost pages of every single provider.
3 replies →
> But the reality is that with Claude, I can get the work done in 1 day that would take me 5 days of struggling,
Is it really a 5x ROI? Where are all the apps, games, platforms, SAAS's, feature s that have been backlogged for 5 years that are all of a sudden getting done? Because I see a modest ROI, and an _awful lot_ of shovelware.
> I'm new to my team and the codebase isn't huge, but it's large enough that there are plenty of parts I have little understanding about.
When you're new to the codebase, things that take an experienced colleague one day to do can take a newbie 3-5 days to do.
> First: There's the obvious "If the company is letting me do it, I'll be wasteful." This includes not clearing/compacting the context often. Opus now has a 1M context window, and quality is good to at least 200K. So each query is burning a lot of tokens until you clear/compact.
What is wasteful? If you are costing the organization $x/hr, and spend an hour saving the company $(x*0.5), you didn't save money, you wasted it.
To the company, are you spending more time being token efficient to save less money than they're paying you for the time? That's not even getting into opportunity costs.
There is some extreme wasteful spending of AI tokens out there. But trying to get below $3k/month in token costs is often of questionable value.
I have ancedotal examples of claude code choosing a solution to a problem that is ridiculously token inefficient.
One example - was giving several agents different sub problems to solve in a complex ML / forecasting problem. Each agent would write + run + read a jupyter notebook. This worked ok, the notebooks would be verbose but it was fine... until one of them wrote out hundreds of thousands of rows to a cell output, creating a 500MB ipynb file. Claude tried several times to read it and it used my entire context limit.
The solution was to prescribe a better structure of doing the world (via CLI analysis scripts + folders to save research results to). But this required some planning, thought, and design work by me the operator.
When I see people spending $10k a month in tokens, I can only assume they are taking lazy hands off approaches to solving problems with the expensive hammer that is claude code. EX: have claude read all your emails every day... the lazy solution is to simply do that, but a smarter solution is to first filter the email body HTML to remove the noise.
> have claude read all your emails every day...
To be fair, I do that. 2-3 times a day, in fact. Not all of my emails (the archive has ballooned to several hundred thousand messages total), but the most recent ones certainly.
My standard prompt is along the lines of "go through the last N days of my emails, identify all threads that I need to know about, action on or follow up with". N is usually a number between 2 and 5. I've specified a standing of set of rules to easily know what is likely a source of noise to aid in skipping the bot spam.
The company is charged API pricing through an enterprise contract, and I remain persistently curious how much I burn. My daily admin-related token expenses appear to fluctuate between $1 and $5. For something that saves me up to 2h of time a day I consider that a rather tolerable deal. (When I dive in to code to do refactors or deep investigations, I can spend as much as $25 a day.)
This is a good example of doing it in a deliberate way that absolute is worth the tokens etc... especially when you are keeping tabs on the cost vs time saved.
The example I was thinking of would be a vibe coder having it "read my emails every hour" only for claude to read the same 1000 emails over and over...
> have claude read all your emails every day
But that is exactly what it is sold to people to do as a panacea: consume all the data, produce insights.
Nobody is being instructed to be judicious. Everyone is being instructed to use it as much as possible for all problem areas.
I should of emphasized it better in my comment but, the nuance of "read all my emails every day" as a prompt can yield a wildly different solution than "read recent emails every day". The first may literally read all your emails over and over, which is a ton of tokens for little gain. The latter is orders of magnitude fewer tokens with the same if not more productivity.
The difference here is just one word in the prompt, but serves as an example of how just a little but of deliberate thought in one's prompt can yield massive efficiency in outcome.
Whats wild to see both online and at work, non engineers given vibe code like tools will quickly show their ignorance to the importance of deliberate design and need for specific instructions one learns via coding. The "missing semi colon" meme is an example of the intuition we all developed early in our coding careers.
Many people are hoping AI can build and design for them, when in reality the deliberate design choices up front are as important if not more so than before AI.
If you make 500k and aren't spending 250k in token, you should get fired.
4 replies →
>> Nobody is being instructed to be judicious. Everyone is being instructed to use it as much as possible for all problem areas.
Do you think this is because the LLM owners have such a massive ROI they're trying to cover so they're actively encouraging teams not to be judicious so then you get into this vicious cycle where both the LLMs and companies are both burning through cast like crazy?
Really depends on the repo you’re working in.
If it’s very large, especially if the tool needs to refer to documentation for a lot of custom frameworks and APIs, you often end up needing very large context windows that burn through tokens faster.
If it’s smaller or sticks with common frameworks that the model was trained on, it’s able to do a lot more with smaller context windows and token usage is way lower.
The codebase and the topic you're working on are huge variables.
I don't use LLMs to write code (other than simple refactors and throwaway stuff) but I do use them heavily to crawl through big codebases and identify which files and functions I need to understand.
Some of the codebases I explore will burn through tokens at a rapid rate because there is so much complex code to get through. If I use the $20 Claude plan and Opus I can go through my entire 5-hour allocation in a single prompt exploring the codebase some times, and it's justified.
Other times I'm working on simple topics, even in a large codebase, and it will sip tokens because it only needs to walk a couple files to get to what it needs to answer my questions.
I'm currently in repos where the context window required is so large that the output is almost always "wrong" for the problem at hand. Quite a few people at my company burn through tokens this way, and it certainly isn't providing value to the company.
As always, improving accessibility for humans makes automation more effective. If the humans need to remember a PhD's worth of source code/documentation to contribute effectively, your codebase stinks.
4 replies →
Begs the question if we should move on to minimal microservices so that whole project lives in context of llm. I hardly have to do anything when I'm working with small project with llm.
Why not take it a step further? Make each function in the codebase its own project. Then the codebase can fit into the context window easily. All you have to do is debug issues between functions calling each other.
4 replies →
In my experience, the result is just more crawling across the separate microservices and additional reasoning to confirm how it all fits together.
The monolithic codebases are easier to crawl for any problem that can't be conveniently isolated to a single microservice.
3 replies →
Ironically this is accidentally begging the question - that breaking them up into LLM context windows would be good because it would be to fit them in LLM context windows.
Maybe you're right but I'm aghast at how much of engineering over the last 15 years has been breaking up working monoliths to fit better within the budget of an external provider (first it was AWS). Those prices can change.
There are good reasons to use microservices but so often they're used for the wrong reasons.
I've done the opposite, moving multiple tightly coupled repos into a single monorepo. Saves the step of the llm realizing there's a bigger context, finding the repo, then also scanning/searching it. Especially for fixes that are simply one line each in two repos.
1 reply →
Generally speaking no. Treat your IP (the code that runs your business, makes your business competitive or special) as precious and don't make it subservient to infra. It should be in the format (code, architecture, structure) that best serves it.
1 reply →
Orchestration between those services and the integration testing for any reasonably complex change can still be quite large.
The whole service might fit in a context window but the details of the system around it will still be relevant.
Yes, in a reasonable microservice land where the places you need to connect to are all documented in very concise places, you have have extremely productive $10 days. In the giant monorepo with everything custom, you can't just rely on built in knowledge of 80% of you libraries, so it's a very different world.
A place like Google has to be so much better off just training library concepts in, given how much of the things the LLM will "instinctively" reach for are unlikely to be available. Not unlike the acclimation period what happens when someone comes in or out of a company like that, and suddenly every library and infra tool you were used to are just not available. We need a lot more searching when that happens to us, and the LLM suffers from the same context issue. The human just has all of that trained in after a 6 months, but the LLM doesn't.
> A place like Google has to be so much better off just training library concepts in
They did that, there was a special version of Gemini fine-tuned on internal code. But then the main model moves so fast that it is hard to keep such fine-tunes up to date and on the latest.
On larger repos it spends a lot of time just finding the one line of code that needs to change. (I have the same problem, as a human!)
So if the AI could do the same work on huge codebases with far fewer tokens, would it be good or bad for the AI companies do you think?
Unquestionably good. They want a product that provides value anywhere it's tried so as to establish the reputation as a magic human replacement. Gaming consumption based pricing at this point would be quitting before the race is over. They can always tweak the pricing knobs later once the industry is fully hooked.
1 reply →
It would be good for the first AI company offering this.
1 reply →
Will this result in people moving away from large monorepos to per-unit, quasi-micro repositories to save in token use?
> I just can't figure how _how_ to burn that much money a month responsibly.
Same but in regards to quotas. I'm on the 200 EUR ChatGPT plan, so presumable have the highest quota, using the "most expensive" models, on highest reasoning, in fast-mode (1.5x quota usage) and after a full day of almost exclusively doing programming with agents, I still get nowhere close to hitting my quota.
In fact, since I started using agents for coding, the only time I even got close, was when I was doing cross-platform development with the same as above, but on three computers at the same time, then I almost hit my weekly quota. But normally, I get down to ~20% of the quota but almost never below that. I don't see how I could either, I'm already doing lots of prompts and queries "for fun" basically.
Codex quota is suspiciously high right now. Either way, the subscription plans are not sustainable, and perhaps less relevant to any discussion about corporate API use. The prosumer developer plans are an insane deal. It is a golden age right now and it will end. If you tried to use the APIs to achieve the same thing, you would be spending thousands upon thousands of dollars a month. My completely unfounded conjecture is that OpenAI is trying to grab developers back from Claude by burning $$$$.
> If you tried to use the APIs to achieve the same thing, you would be spending thousands upon thousands of dollars a month.
Yeah, obviously, not sure why anyone would be using APIs at this point, seems bananas to spend more than 10 EUR per day when these "almost-endless" subscriptions exists.
> My completely unfounded conjecture is that OpenAI is trying to grab developers back from Claude by burning $$$$.
Unlikely, since codex TUI was launched OpenAI pretty much had every developers pocket already as the agent is miles and leagues ahead of Claude Code, pretty much from inception. No other provider comes close to ChatGPT's Pro Mode either, I don't even think it's a quota/pricing thing, have the best models and people will flock by themselves.
3 replies →
I don’t think anyone has to sell inference below cost. If Anthropic is GPU-constrained, then it makes sense for them to charge much much more on API users and push subscribers towards extra billing, because that’s the only knob they can turn. OpenAI has much more capacity based on news reports.
Codex quota is/was 2x its normal amount for some promotion or something. I thought it ran out today but can't check right now
1 reply →
I have to churn to get to my ChatGPT Plus $20 plan limits with gpt-5.5 xhigh. Starts to feel like I'm doing something wrong.
I am running a bunch of autoresearch loops that optimize various compilers and its pretty easy to burn through as much money as you want if you have a measurable goal and good tests.
> have a measurable goal and good tests
I have both of those, yet seemingly I guess I'm not setting my goal in such a way that it supports "endless inference" like that. My goals have eventually ends, and that's when I move on. Optimization sure sounds like something you can throw away a good amount of tokens/quotas on, so yeah.
> if you have a measurable goal and good tests.
you can burn through the money even easier if you don’t have them
There are tools that let you extract out what the API price would be for a subscription plan use. I typically have monthly runs that are on the order of $2k - $4k at API prices, despite paying a mere $200/mo to Anthropic.
Edit: Just checked with ccusage and I've been doing about $450/day for the last week. A bit more than usual, but I still haven't come close to weekly limits and never hit the 5hr rate limit.
> Same but in regards to quotas. I'm on the 200 EUR ChatGPT plan,
The API rates and monthly plan rates are not the same.
If you're using enough to justify the 200EUR plan (instead of the 100EUR plan), your use might actually be as high as some of the API bills discussed above.
[dead]
I'm on the same page. Do people not analyze the problems themselves? Are they just copy/pasting their entire ticket description into Claude Code and having it iterate until they land on something that works?
I don't get it.
> Are they just copy/pasting their entire ticket description into Claude Code and having it iterate until they land on something that works?
That is exactly what they are doing, yes
That's my take as well. I've had my unPRed branches grabbed up and blindly merged by an agent twice now. The guy doing it was shocked both times that his PR had my change sets in it.
Also one engineer is treating the code as assembly. I've asked some pointed questions about code in his PR and the response was "yeah, I don't know that's what the agent did".
Edit:
To everyone freaking out about the second guy. Yeah, I think being unable to answer questions about the code you're PRing is ill advised. But requirement gathering, codebase untangling, and acceptance testing are all nontrivial tasks that surround code gen. I'm a bit surprised that having random change sets slurped up into someone else's rubber stamped PR isnt the thing that people are put off by.
29 replies →
It's bizarre to me that people being paid to use their brains with a job title including the word "engineer", which essentially means "clever thought thinker" in Latin, just offloading all of their thinking to a bot instead of just using it as a way to ensure clean execution and faster understanding of the structures of underdocumented projects.
1 reply →
And why wouldn't they? Companies are quite literally instructing them to do so. I work at such a company and have heard similar anecdotes from colleagues that work at other companies.
2 replies →
To be fair, taking an average SWE at $160k/y, and spending $1k/m, and offloading mechanical ticket work from their working set sounds like a bargain to me. They could be spending the time on design and planning and working on new things, figuring out how to save costs in optimizations. In fact for every soul sucking mechanical task you offload, the better of you are overall.
It’s not like AI is the first time this happened. CI/CD and extensive preflight and integration and canary testing is also a way of saving engineer time and improving throughput at the cost of latency and compute resources. This is just moving up the semantic stack.
Obviously as engineers we say “awesome more features and products!” but management says “awesome fewer engineers!” either way pasting the ticket in and letting a machine do the work for a fraction of the cost was the right choice. There’s no John Henry award.
6 replies →
[dead]
That'd be crazy. The agent has a skill configured to fetch ticket descriptions from Jira by itself. Copy-pasting feels like manual labor.
Not what I do. I'll reformulate the ticket description so that the purpose and as many details as possible about the solution are made clear from the start. Then I tell Opus to go and research the relevant parts of the codebase and what needs to be done, and write its findings to a research.md file. Then I'll review that file, bring answers to any open questions and hash out more details if any parts seem fuzzy. When the research is sound I'll ask Opus to produce a plan.md document that lists all the changes that need to be made as actionable steps (possibly broken into phases). Then I'll let Sonnet execute the steps one by one and quickly review the changes as we go along.
You are making it too hard on yourself. Most people would just paste the ticket URL and type "fix this", then spend the next 3 hours on social media.
OTOH, I try hard to provide all possibly relevant context, manually copy/paste logs to reduce context overhead, always ask to produce an implementation plan and review it before making any code changes. Yet I often feel like a dinosaur here, all coworkers who tout "LLM productivity" just type a few words in and let the agent spin for hours without any guidance.
1 reply →
> Are they just copy/pasting their entire ticket description into Claude Code and having it iterate until they land on something that works?
"Their ticket" = that was AI generated. After which they will wait their AI generated PR be checked by an automated AI QA that will validate against the AI generated spec.
It feels like important metric of "corporate AI adoption" should be how effective the human in steering the AI.
IF THE HUMAN ISN'T EFFECTIVE, THE HUMAN NEEDS TO GO.
You should.
If it manages to solve the working solutions - then it's great! why would you waste your time on it?
It it fails - then it's great! you find your value by solving the ticket, which can be a great example where human can still prevail to the AI (joke: AI companies might be interested to buy such examples)
(All assuming that your time cost is pricier than token spending. Totally different story if your wage is less than token cost)
Actually no. We ask business analysts to supply documentation for whole products. We use AI to analyze that documentation and after that we use AI to create tasks in Jira. Business analysts will review them.
After that we use AI to translate the tasks to a more technical view.
After that we use AI to implement the tasks.
After that we use AI to review the tasks.
After that a human QA tests the tasks.
If all is good, the code is merged and lands in production.
And yes, we burn a lot of tokens but the process is very fast. It takes months instead of years.
> Are they just copy/pasting their entire ticket description into Claude Code and having it iterate until they land on something that works?
There's also the pattern of creating an army of agents to solve problems. Human write a plan. One agent elaborates on it. Another reviews it and makes changes. Another splits it up into tasks and delegates out to multiple agents who make changes. Yet another agent reviews the changes, and on and on. All working around the clock.
If Uber is like most other companies, there's a leaderboard for AI tokens consumed. If maximizing your token usage is going to get you to the top of the leaderboard, and therefore promoted for "productivity", people are going to find creative ways to be "productive".
The tokenmaxxing leaderboard where I work has a lot of new hires on it
Asking the LLM to spawn a subagent per file and look for bugs is a good way to waste a lot of tokens real fast for leaderboard success, and it's pretty defensible as useful work if someone tries to call you on it.
1 reply →
One thing that stands out it is it sounds like you're using LLMs for only one part of your process. You're having LLMs help you write code, but the code you're writing doesn't itself make use of LLMs.
My current job basically involves trying to improve processes that themselves make heavy use of LLMs. Once you have multiple agents in parallel running multiple experiments on improving the performance of primarily LLM driven tools it's not that hard to get your token usage pretty high.
Claude is a mediocre programmer that can do great things with great supervision, but it can't make mediocre human programmers into good ones, because they can't provide great supervision.
It will try and try and try, though.
id bet its the LLM doom loop: vaguely ask it to do something, tab to news.ycombinator.com for 30 minutes, tab back, noticed it misunderstood the prompt. Restart with new improved prompt, tab back to HN.
So yeah, probably the same thing people do anyway, just not compile time its now generating time.
We opened the Cloud Code floodgates all at once in my org. After a few months we looked at stats, and asked managers for impressions on performance changes. The API cost per engineer doesn't correlate with the apparent increases in performance, but it sure seems that the vast majority of people that used to have good reviews got a lot better, while the bottom third just didn't, even though they use the LLMs about as much. It makes the performance differences in teams look like an abyss. Someone appears stuck in a task, and we see what they've been prompting, and then one of the best seniors comes in, actually asks the questions well, and the LLM does all the debugging and all the fixing in 20 minutes.
It's not that the best performers are magical prompt engineers providing detailed instructions: They ask better questions that the LLM knows how to try to answer, and provide the specific information that the LLM would take a while finding. It's as if some people just had no "theory of mind" of the LLM, and what it can know, and others just do. It's not a living thing or anything like that, but it's still so useful to predict it, put yourself in it's shoes, so to speak. Just like you'd do with a new hire, or a random junior.
1 reply →
Several options on how to burn that amount of money without being specifically looking to tokenmaxx
- Agents that spawn other agents
- Telling agents to go look at the entire codebase or at a lot of documents constantly
- MCP/API use with a lot of noise
- Loops where the agent is running unattended.
I do think it's not really responsible use and a loop where the agent is trying to fix CI for one hour for something that would take you five minutes (for example) is absurd. But people do that.
One of the new dynamics is a loop between a "code review" LLM and a "fix LLM". It's super annoying because the code review LLM often finds more bugs on a follow-up review that were there from the beginning, but at least I can loop both until check go green.
I spend 400-500 dollars per day during active development at this point. However with more aggressive task breakdowns I can spend ~5k per day.
These spend rates are in part due to operating on a larger code base. Operating on a larger code base means more time searching and understanding the code, tests, test output. They are also due to going all-in on agentic coding.
It can feel painfully slow to go back to coding by hand when for a dollar you can build the same functionality in a minute. Now do this with multiple sessions and you can see where the cost goes.
Your reply answers how you are able to spend money, not if it is returning sufficient dollar value per spend..
> I genuinely challenge someone spending $5-$10k a month to demonstrate how that turns into $50-$100k in value.
The problem with HN is that everyone here thinks like an engineer, not like a business owner.
$10k a month on tokens is just not that much when you're already making $2M per engineer. If their productivity has increased even 10% then the spend was well worth it.
Case in point, Meta made 33% more revenue this earnings report. Now you can nitpick and ask for attribution down to the dollar, but macro trends speak for themselves.
4 replies →
I've been working on a project to build a new Postgres based database in Rust[0]. I'm four weeks in and have 93% of the Postgres test suite passing. I've found agents to have worked really well for this as I have an existing codebase that has good architecture that I can point my agents at. It's also easy to debug as I can diff what my agents are doing and what Postgres is doing.
I've had to get multiple codex accounts, but there was a brief period of time where I tried API usage to see how expensive it would be. In about an hour I spent $650 of credits. I had codex estimate how much I would be spending if I was doing pure API usage and it estimated around $10k/week.
For context Postgres is 1M lines of C code. It's looking like pgrust will come out as less lines of code than Postgres and at peak I was adding over 100k lines of code in a day. I would estimate it would take a team of 5 software engineers at least 3 years to get to where I got in a month with a couple Codex subscriptions.
[0] https://github.com/malisper/pgrust
> responsibly
There’s your problem. You’re trying to be responsible instead of trying to burn tokens so you can have your name on top of some leaderboard for most wasteful AI users.
The perverse incentives created by these AI leaderboards are crazy.
The leaderboards are dumb, but I understand the point of telling people not to worry about tokens and just use it. They are trying to get people to try it, to discover new uses without asking “is this worth testing”. It’s basically early R&D budget. Eventually these companies will decide it’s time to transition into efficient usage.
1 reply →
But we need OKRs rocks and METRICS! Everyone must have their own one numberrrrr!
I dont use automated agent workflows or anything, I just use clause as a pair programmer of sorts. A month or so ago I used claude Opus 4.6 for 2-4 hours on API pricing and racked up $20 in spend, which surprised me since that was much higher than my usual.
I dont know about $10,000, but i can see hitting $1,000 pretty easily if you aren't looking at the costs.
It really depends on the way you use AI. If you just prompt it for a task and either accept or reject the output, you won't spend much.
But if you are like me, you aggressively document and brainstorm before planning, you review that documentation with subagents, make modifications, you aggressively plan, you verify that plan with subagents,make modifications, have a large number of phases, planning again for each phase, writing tests to cover 100%, implement each phase, do intermediate and final code reviews with subagents, apply fixes, write final documentation and do all these in parallel, if you have multiple tabs in your terminal each running Claude Code for 10-12 hours a day, then $5000 per day is not much.
If you use Anthropic or Open AI subscription and you spend $1000 per month, you are not using AI much.
Are you bringing in at least $1.25M in additional yearly revenue to your company?
I think more than that.
And also some of use run tens of rounds of gradually improving the projects. And that burns tokens like crazy.
It turns out writing good prompts helps to keep token usage down as the model wastes fewer tokens discovering context it needs that wasn't hinted at in the prompt.
Whereas a good prompt will give solid leads to all the specifics needed to complete the task.
I spent $24,096.47 in "API" costs with my $200 Claude Code Max subscription in April.
I'm building my own saas. I spent 6 months writing the code by hand before using Claude, and that was fine, but its much faster to give the exact specs to Claude and have 3-4 sessions working in parallel with me. When you validate changes with exact test specs there's much less correction you need to do. I always hit my weekly limit and it's far cheaper for me to use this than to hire someone and spend time onboarding them.
>I genuinely challenge someone spending $5-$10k a month to demonstrate how that turns into $50-$100k in value
At a lot of businesses $5-10k/mo of AI spend doesnt even translate into $5-10k/mo value. Churning out code was rarely the business value bottleneck. It was convenient for everybody else to blame developers not writing code fast enough for their failures. Now they have no excuse but I doubt will own up.
You are probably guiding them step by step and reading the results. Maybe you also sit and wait for the results.
Agents can iterate on a problem for hours if they can see their results and be given a higher level goal to evaluate their progress toward.
When you have an agent working for minutes or hours, never wait on it. Use that time to spin up another agent.
You can also spin up several agents in parallel to attempt the same item of work and compare their results to choose which to work off for next steps, instead of rolling the dice on a single option at a time and gambling that it's better to refine that first attempt instead of retrying from the start several more times.
And if you are doing manual QA manually, you're missing out on having e.g. Codex's "Computer Use" or "Browser Use" automate your manual verification steps and collecting a report for you to review more quickly. Codex can control multiple virtual cursors simultaneously in the background without stealing focus, to parallelize this.
If you want to use up more tokens to get more done (though more outside of your control and ability to review of course), that's how.
I'm working on some serious data analysis + realtime async code, and I use 200-400 million tokens a day with Claude Code alone (via ccusage). The complexity of the code seems to have a big impact on the number of tokens used. On simpler projects I use many fewer tokens.
My programming endurance is much greater now (2-3x focused hours per day), my productivity per hour is multiples higher, and I code seven days a week now because it's really exciting.
All told, I would pay for these tools as much as I would pay for full-time human programmer(s).
Multimedia feedback can burn much more than that. If I'm sending frames of 3D engine's output. I mean I would like to send it a video if I could but that is too expensive but I'm sure there's orgs out there that really do want every frame in a prompt doing something. This can be exponential depending on the application. I recently wrote a Milkdrop visualization analyzer. I could have sent thousands of frames for each one. I didn't but well I wish I could haha.
> I'd much rather hire a junior engineer who spends $100-$200/month
I'd much rather hire a junior engineer at $1.20/hour too! Can you hook me up with your contract services provider?
Obviously I know you're talking about AI costs only. But the idea of doing that analysis without looking at the salary of the person running the tool seems to be completely missing the point.
Now, sure, there are legitimate arguments to be made about efficacy and efficiency and sustainability and best practices. But, no, $100k/year absolutely doesn't need to be "justified" if it works. That's cheaper than the alternative, and markedly so.
> But, no, $100k/year absolutely doesn't need to be "justified" if it works. That's cheaper than the alternative, and markedly so.
If you're trying to say that 100k is less than 200k, you're right.
I don't see how any of that won't need to be justified. You can spend a lot of money and not get enough of a return...
FWIW, you're nitpicking a strawman. I put "justified" in scare quotes for a reason, qualified it with "if it works" (which is, quite literally, the definition of a justification) and put it immediately after a sentence enumerating a list of legitimate questions for debate (all of which would be part of any justification analysis).
You agree with me, basically.
The core point is that these very large AI bills are not actually large in context, as the pre-existing scale of expenses for software engineering are larger still and this at least promises to reduce those markedly.
To wit: argue about whether AI works[1] for software development, don't try to claim it's too expensive, it's clearly not.
[1] "Is justified" in the vernacular.
Yeah, I use Claude Code to do security reviews. For every CVE that Wiz flags, I have Claude Code check for reachability analysis.
I typically consume about $200/month doing this. Most of our engineers are in the $200-400 range, with a few people around $1,000.
But then there's one guy who's not only hitting $8,000, but supposedly has nearly 300,000 lines of code accepted (Note: This means he's accepted the lines of code from Claude, not that he's committed it). I can't figure out how.
Do lots of deep research and code reviews on large legacy codebases. I've created lots of documentation to reduce token consumption but it's still a lot of token consumption.
The answer may be agentic loops that keeps cycling through the same problem again and again until they land on a non-erroneous outcome. Some people boast having multiple such agents working in parallel on different problems, tending to one while another is processing, perhaps not unlike the movie mad scientist who runs around the lab throwing switches while laughing maniacally at the prospect of his impending success.
There was a tool posted called codeburn that showed a breakdown of what activity your usage was spent on. Mine was almost all coding but other people in the thread said >50% of their usage was conversation. I’m inclined to agree with you that someone who is reasonable with their compute usage is likely to be thinking things through rather than just burning tokens to get an LLM to solve the problem
In addition to what folks are saying here about larger code bases and multiple features at once, there’s also the time requirement to be efficient. It takes time to be more efficient with token usage and it may not be worth it for some of these companies so… burn away until we start to get more data and then we’ll check in.
> I just can't figure how _how_ to burn that much money a month responsibly.
I always have a few agents (2-5) doing research and working on plans in parallel. A plan is a thorough and unambiguous document describing the process to implement some feature. It contains goals, non-goals, data models, access patterns, explicit semantics, migrations, phasing, requirements, acceptance criteria, phased and final. Plans often require speculative work to formulate. Plans take hours to days to a couple of weeks to write. Humans may review the plans or derived RFCs. Chiefly AI reviews the code (multiple agents with differing prompts until a fixed point is reached between them). Tests and formal methods are meant to do heavy lifting.
In my highest volume weeks, I ship low hundreds of thousands of lines of software not counting changes to deps.
> At a corporate level, I'd much rather hire a junior engineer
Any formulation of problem sufficient for a truly junior engineer to execute is better given to an agent. The solution is cheaper, faster, and likely better. If the later doesn't hold, 10 independent solutions are still cheaper and faster than a junior engineer.
There is no longer any likely path to teaching a junior engineer the trade.
Just out of curiosity, what type of systems are you working on? What type of features did you implement on your 100k LOC week?
> What type of features did you implement on your 100k LOC week?
I work on 3rd party API integrations, of which, we have hundreds, each in its own repo. We need to build thousands more at a fraction of the cost. Any given integration historically takes a human a few days up to a few months to build and is subject to ongoing maintenance. We frequently do not have access to the API and we mostly never have a representative data set if we do. Complex APIs tend to expose multiple, entwined data models. Documentation may be wrong or in a foreign language.
I've been building a new framework to do it better. Ideally, we can get an agent to spit them out in a few minutes to hours with a much reduced ops burden for managing the fleet, all with very high confidence. The later requires pushing as much into the type system as possible and leveraging static analysis. Much of the work has been embarrassingly parallelizable. Consider categorizing access patterns across the entire set or ensuring byte for byte parity (over the input space of third party API responses).
This is absolutely not a problem that a human or 2 could tackle prior to AI.
1 reply →
I don't know about the GP, but my workflow is similar to theirs, but I aim to ship low thousands of lines per week. The fewer the better. I even tell the agent to only write high SNR tests, otherwise it just adds useless "make sure this function returns this thing we hardcoded".
I usually succeed, BTW. I spend a lot of time planning, but usually each PR is a few hundred lines, and fairly easily reviewable.
I mostly work with Python backends, though these days it might be any language (Ruby, Go, TS).
I am sorry, I am probably just very dumb, but this sounds extremely wasteful. If this is a reflection of how software was made before AI I wonder how anything was ever made.
> In my highest volume weeks, I ship low hundreds of thousands of lines of software not counting changes to deps.
I suspicious you actually get claude to output that much usable code in a week, but maybe you do.
But I’m 100% positive that you’re not shipping even a small fraction of the amount of value that someone reading this 2 years ago would have expected from hundreds of thousands of lines of code.
I dunno I've seen agents make boneheaded mistakes even a junior engineer wouldn't make. Treating them as strictly better than junior engineers is a problem, not just for that reason but because you're effectively killing the pipline for senior engineers. Then what?
> I dunno I've seen agents make boneheaded mistakes even a junior engineer wouldn't make.
Yes, of course.
> you're effectively killing the pipline for senior engineers. Then what?
I honestly don't know _what_. Its a prisoner's dilemma.
You will burn yourself out in months at that level of daily context switching.
It isn't worth it.
> In my highest volume weeks, I ship low hundreds of thousands of lines of software not counting changes to deps
But what do they actually do?
I keep seeing people wax poetic about the mountains and mountains of code that LLMs are dumping out but I'm yet to anywhere near a proportionate amount of actually useful new apps or features. And if anything the useful ones I do find are just more shovels for more AI. When do we get to the part where we start seeing the 10x gains from the billions of lines of code that have probably been generated at this point?
On the OpenAI side, GPT-5.5 generates spend at a prolific rate that's even faster if you use it through an ACP connection in a tool like Zed. I used to never think about Codex rate limits and now I'm hitting mine every 5 hour block and spending ~$100/day on top of that in adhoc credit purchases.
I think companies are charged API prices vs individual prices. That alone is 10x for Anthropic. Not sure though.
Don't underestimate corporate waste. If it's not someone's job to care for something, they really won't.
Even before this AI wave, it was common for me to see spinning dev environments for like $3k/month that hadn't been used in months on AWS.
I use it as an ide. I am a security engineer but there a bunch of predictable things I need to write code for. Onboarding logs, writing detection rules, SOAR type stuff. It makes a diff and locally tests all the permutations I describe than I review the code.
I don't think it's about value. Tokenmaxxing is a thing now since that one CEO said he wants his $250k/yr devs to use $400-$500k/yr in tokens, so now it's all about how many agents can you have running concurrent tasks all day long.
> I use llms daily
this is your “problem” - you are missing the “nightly” part. on my box LLMs run 24/7 :)
In our org it's people that have too much stuff in their context, every mcp in the world installed, GTD, PAI, OpenClaw. I'm equally baffled how one can spend that much money during their day to day.
I also don't think a lot of people know some of the more advanced context management tricks like /rewind /fork /tree to take advantage of prefix caching
Your estimates do not account for speed of delivery. If an AI can deliver 10x faster, the target is less than 10x a dev salary.
But 10x faster also gets you to market sooner. Which has value.
I could argue in all the ways my personal experience disagree, but lets just Occam's razor:
Most people agree big orgs regularly have dysfunctional incentives. We've seen it happen a thousand times.
Your suggestion requires we also assume a 10x faster delivery time by people spending 200$ vs 1000$ - something I've yet to witness or hear a credible account of.
So while that might be true in a small number of cases, in general its foolish to go with the "10x delivery speed" hypothesis.
1. Worktrees
2. Multiple simultaneous projects
3. Orchestration that includes handling of CI workflow
4. Active work to further improve or refine tooling
5. Experimentation producing muscle memory as experience versus code output
It turns into 50k to 100k or more of value for the employee the moment upper management made AI spend a personal performance target across most corporations.
It's easily explained. People are losing their skill in real time and literally cannot develop anymore without AI. That's it.
They keep forgetting to put "make no mistakes", "think deeply" and "get it right the first time" in their prompts.
When people have no ability to understand what they are doing, they will just rerun it endlessly hoping they get something passable. When that doesn't happen they burn money.
I doubt most of this is from rerunning the same prompts over and over. This token burn is more likely from people using swarms of agents and orchestrators for “efficiency”.
“I’ve got 2 dozen agents churning through the backlog to build this feature that would take one agent an hour to implement.”
managers call meetings and agents call swarms.
Keep word doing A LOT of lifting “responsibly”
At least your workplace doesn't frame raw usage as a leaderboard, with awards given out for topping it
You're probably generating new code rather than analyzing old code for "improvement".
> in deep thinking mode
You mean deep brute-force mode of search results parsing themselves…
$400 * 23 business days would be $9k. Sounds ballpark to me
Many companies actively hide the cost from their employees.
Do you run 20 claud code agent on max for 8 hours a day? :)
a good way to prevent companies from adopting AI (and keeping your job) is to waste tokens making AI cost prohibitive
That would be true in a sane world with investors who value profitability. But everything is now focused on DAU and the network effect. Overusing their services might actually make them look better to investors who shovel more money to them to light on fire.
Idk about Uber in particular, but aside from legit programmers using AI to help them do legit work faster, there are people spamming it for metrics. And the hiring pipeline has gotten screwed up somehow, like half the people who reached the onsite interview for a technical role lied about all their technical skills, or they didn't lie and manage to pass hiring but then only take the tasks that AI can solo. And if it can't, waste tokens until giving up.
Advanced agentic prompting.
Try the Jira MCP server.
Slop architecture leads to compounding problems that people try to solve with more slop. If one wants to control the quality of the code then the throughput and multithreading is bottlenecked by how much code one can comprehend in a given period of time.
> notice more and more are consumed $1k in tokens a month
I've said it before: if you allow people to see how much others spent, they will try to climb up the "leaderboard".
It takes just ONE little praise for using tokens or one perk gained, and the GAME IS ON among the developers!
I used CC frequently for development, Opus 4.7 with high thinking, with a $100 Max subscription, and haven't been rate limited yet. IMO a subscription is the way to go as it puts a ceiling on spending.
> I just can't figure how _how_ to burn that much money a month responsibly.
Well, if your bonus depends on spending it, you'll find a way.
My observation is - pasting long documents is a great way to burn tokens. Turn based conversation, even a very deep and technical one, consumes less tokens than "read these logs and tell me where the problem is". Ironically, the log reading example is a perfect use for a local LLM.
[dead]
[dead]
[dead]
[dead]
In your fictional world you hire a junior who will write code manually, right?
First , I interview people, Junior skills in manual coding dropped sharply this year. These are people who started they school manual and switched mid-course. In two years there will be no such people.
well, that will never happened anymore in this world unless we will go back to caves, especially for juniors. Junior that writes good code is already a dying unicorn.
The outcome will be ... you will hire a junior ... who will burn more tokens, and chances of mistakes with less expensive model, less tokens are even higher.
> well, that will never happened anymore in this world unless we will go back to caves
The bubble is an echo chamber.
I'm interviewing juniors. Their manual skills drops sharply, and that's for people who went to school in manual age, and maybe last year it stopped to be manual. Lets see what will be in a year or two lol
2 replies →
Puh not good signs at all.
I mean even the normal people we get in interviews have no clue, like 80% are just ignorant.
I stoped an interview after 5 minutes: when i asked what ls -ahl is doing, he started telling me how he vibe/ai codes stuff and thats his workflow. Okay if you don't know the basics, guess what? everyone can replace you or at least i'm not hiring you (i only told him thats not what we are looking for and thanked him)
we are doomed :D
The fully loaded cost of a senior engineer is already well past 400k. +5k a month is not that much if it helps them be XX% more productive. Personally at a different big tech I'm in the mid 4 digits AI spend per month and it helps me a lot, basically all coding has been trivialized and I work on an extremely large codebase. I'm spending more time on things closer to direct value generation like data analysis and experiment tweaking rather than spending time moving a variable across 10 layers of abstraction and making sure code compiles.