Comment by Galanwe
2 days ago
> I just can't figure how _how_ to burn that much money a month responsibly.
From my experience, this happens essentially by three means:
- Level 0 (beginner users) long lived conversations: If you dont get in the habit of compressing, or otherwise manually forcing the model to summarize/checkpoint its work, you will often find people perpetually reusing the same conversation. This is especially true for _beginners_, which did not spend time curating their _base_ agent knowledge. They end up with a single meta conversation with huge context where they feel the agent is "educated", and feel like any new conversation with the agent is a loss of time because they have to re-educate it.
- Level 1 (intermediate users) heavy explicit use of subagents: Once you discover the prompt pattern of "spawn 5 subagents to analyze your solution, each analyzing a different angle, summarize their findings", it can become addictive. It's not a bad habit per se, but if you're not careful it can drastically overspend your credits.
Level 3 (expert users) extreme multitasking. Just genuinely having 10 worktrees perpetually in parallel and cycling between them in between agent responses. Again, not necessarily bad in itself, but can exponentially conse credits.
> Just genuinely having 10 worktrees perpetually in parallel and cycling between them in between agent responses. Again, not necessarily bad in itself, but can exponentially conse credits.
I'm pretty sure that growth is linear.
If you think about it, the production quality is probably log-linear, so the token growth may well be exponential.
Not quite the same scenario, but it's already plausible to have a situation where every subagent is allowed to spawn multiple subagents, in which case we'd have literally exponential credit consumption growth...
"i have to burn $10k in tokens to meet my end-of-month work quota. spawn ten sub-agents each of which is allowed to spawn as many sub-agents as it likes to create an analysis of the code in these files based on the precepts of the 13th century German philosopher Noodleheinz".
I think that you send the entire conversation with every request.
As long as you stay under the 1-hour caching TTL for your open threads, I guess your marginal cost is linear.
This is me on a weekday flicking between Ghostty tabs to enter “stand by” every ~45 mins.
2 replies →
I’ve seen another pattern, I call it “The Document Mongerer”:
I regularly work in a largish monolith. We have micro services too, but most things are in the monolith. Over the years there have been multiple pushes to split it up into micro services. These efforts invariably fail because the _goal_ is the micro service architecture itself instead of something useful to the company, like the ability to do fast releases or better organized code.
Anyways, in the past few months I’ve seen multiple people individually ‘attack’ this insane goal with AI. The first step is always to generate massive amounts of documentation describing the current state of code and proposing areas to split up. Then, after the engineer generates this huge store of documents, they say ‘looked what I created’ and then drop it and move on to some other shiny toy. No one will ever read these documents. They are out of date before they ever get ‘completed’, their sole usage is to waste credits.
Missing here: some organizations were rewarding high token usage as productivity without critical evaluation. People were afraid to be in the bottom because outcomes weren't being measured.
It is a giant Goodhart's law lesson
Give your agent a perfectly working code, insist that the output is not what it should be. Go to lunch. By the time you come back, the poor thing will evaporate a small lake trying to figure it out.
"i'm in aisle 32 of the data centre. please evaluate the previous query using exclusively servers 2438-2458. and quickly, it's f-ing freezing in here".
What!? Companies rewarding high token usage? That's inane, insane, and small brained. Who in their right mind equivocates spending more money to bring more productive. I'll just set up some burn jobs to kill tokens unnecessarily and then everyone else will too and the company will go bankrupt in 10 days. It seems inconceivable for a company to set up a "who can spend the most of our money" leaderboard for any other context
I have friends at two different companies that are taking a stick, rather than carrot, approach to this. They've set monthly minimums for token usage. Anything less than that gets you dinged in your next performance review. Imagine hiring a carpenter and writing a bad online review for them because they didn't use their hammer enough, even though the end product was on time, on budget, and worked well.
I was at a company 20 years ago that took this approach to automated tests. Everyone must write 2 a day, even if that's the only code they write that day. Once it was clear that this was being checked with automation, scripts were going around to generate and commit tests that 1 + 2 == 3 (replace with random numbers). Of course tokens are being burned this way at companies like this.
1 reply →
Go look up "Tokenmaxxing."
Yes, it's as stupid as it sounds.
This is essentially companies making their engineers use LLMs as much as possible, and if you don’t, you go on a pip. Many such cases.
If you think this qualifies as insane, you really haven't met many managers, have you...
Given that all of AI is built around the premise that whoever sets fire to the most money wins, it's just users following the lead the vendors.
there are boards… endless boards… ranking by token usage :)
Totally agree!
Bonus level "I have a hammer, all I see is nails": using Claude Code for random non-coding work, like dataset cleaning. It's really convenient to have a script spawning Haikus via `claude` CLI and feeding them prompts and JSON files. Money burn potential: practically unbounded, but also it's real work that the product people wanted done, so of course it has a cost associated with it. I'd be bewildered if anyone complained.
Where is level 2?
It’s probably unary interpreted as binary, hence there is no level 2. Level 3 is followed by level 7. Level n is followed by level 2n + 1. Exponential growth. The singularity is near.
So good...
Still waiting for the output of that agent
LLMs can't count well.
There isn't one. Level 3 is just that much more advanced.
We dropped it before level 3 was released.
level 99 - They're using Gas Town
I’m basically doing lvl 3. There’s not a single port in my local worktree’s .env that’s not guaranteed to be unique across all worktrees. Skills for agent to start their own managed dev server, launch their own isolated instance of chrome etc. literally end-to-end code and debug the entire app. I do have to say though you have to know the app inside out and have to have a pretty well groomed backlog in order to run them all in parallel and actually benefit from it.
Would love to learn more on how you do it. The various skills, tasks, workflow. If you have time and can share it. That would be valuable. :)
as a new user of agents, i am realizing i'm using a strategy basically identical to level 0. is the typical approach to just make a CLAUDE.md/AGENTS.md and start a new thread for each task or is it more complicated than that?
No, it's not. Your context should be SMALL.
https://www.youtube.com/watch?v=-QFHIoCo-Ko
I spend about $3k/month (subsidized by the Claude Max plan).
I guess I fall under level 3 (2?): I typically have 3-6 agents working simultaneously on the same feature, they each make worktrees, code, run tests and put up PR’s. I also have Github actions which scan for regressions and security issues on each PR.
It makes my development cycle extremely fast: I request a feature and just look at Github and look for changes to my human readable outputs, settle on a PR, merge, repeat.
The issue is that I am now the bottleneck in my system. I find myself working basically non-stop, because there is always more to do. (Yes I know I can automate the acceptance criteria but that turns to slop real fast)
So LLMs produce PR for you, and you quickly merge them? Does anyone besides you having even a little look at them?
It's interesting how quick automating others out of a job turna to automating ourselves out instead
this is where true mastery is, the best DevOps guy I ever worked with told me something that stuck with me forever, “if I do my job right, I should no longer be needed on this project.” time will tell is LLMs will allow SWEs to say the same thing…
How do you compress or otherwise force a model to checkpoint?
>> Again, not necessarily bad in itself,
yeah, it is bad. Human brain is not able to properly assess this amount of changes. To understand even a small change you need a lot of capacity. To understand thousands of lines - impossible.
This is pure slop pouring into prod and we can see more and more consequences of this in all big corps's products - things start to break more and more exponentially faster.
The thing I keep coming back to is - does it matter?
Really does it matter if a company produces something that breaks constantly or gets worse or slower.(See github) Megacorps have a wide moat and have forced out all competition or they just buy them with low interest loans.
The quality of products keeps getting worse and we can do nothing but live with it. So if that's the state of the world, why wouldn't you just push as many "features" as fast as possible. More is rewarded. Less is punished. Quality does not matter.
Probably, at some point.
People used to say "So Windows is bad, but does it matter?" And it seems that this does matter, so much that Microsoft (appears to) want to improve Windows' user experience.
1 reply →
"or they just buy them with low interest loans." lmao
What about Level 2?