I take a peak every month or so at spend for my company and notice more and more are consumed $1k in tokens a month and it is bewildering to me how. I use llms daily, and see anywhere from $200-$400 tops. This is using the most expensive models, in deep thinking mode. So I'm not a Luddite against the usage of them. I just can't figure how _how_ to burn that much money a month responsibly.
I genuinely challenge someone spending $5-$10k a month to demonstrate how that turns into $50-$100k in value. At a corporate level, I'd much rather hire a junior engineer who spends $100-$200/month and becomes productive then try and rationalize $100k/year in token spend.
> I just can't figure how _how_ to burn that much money a month responsibly.
From my experience, this happens essentially by three means:
- Level 0 (beginner users) long lived conversations: If you dont get in the habit of compressing, or otherwise manually forcing the model to summarize/checkpoint its work, you will often find people perpetually reusing the same conversation. This is especially true for _beginners_, which did not spend time curating their _base_ agent knowledge. They end up with a single meta conversation with huge context where they feel the agent is "educated", and feel like any new conversation with the agent is a loss of time because they have to re-educate it.
- Level 1 (intermediate users) heavy explicit use of subagents: Once you discover the prompt pattern of "spawn 5 subagents to analyze your solution, each analyzing a different angle, summarize their findings", it can become addictive. It's not a bad habit per se, but if you're not careful it can drastically overspend your credits.
Level 3 (expert users) extreme multitasking. Just genuinely having 10 worktrees perpetually in parallel and cycling between them in between agent responses. Again, not necessarily bad in itself, but can exponentially conse credits.
> Just genuinely having 10 worktrees perpetually in parallel and cycling between them in between agent responses. Again, not necessarily bad in itself, but can exponentially conse credits.
I’ve seen another pattern, I call it “The Document Mongerer”:
I regularly work in a largish monolith. We have micro services too, but most things are in the monolith. Over the years there have been multiple pushes to split it up into micro services. These efforts invariably fail because the _goal_ is the micro service architecture itself instead of something useful to the company, like the ability to do fast releases or better organized code.
Anyways, in the past few months I’ve seen multiple people individually ‘attack’ this insane goal with AI. The first step is always to generate massive amounts of documentation describing the current state of code and proposing areas to split up. Then, after the engineer generates this huge store of documents, they say ‘looked what I created’ and then drop it and move on to some other shiny toy. No one will ever read these documents. They are out of date before they ever get ‘completed’, their sole usage is to waste credits.
Missing here: some organizations were rewarding high token usage as productivity without critical evaluation. People were afraid to be in the bottom because outcomes weren't being measured.
Bonus level "I have a hammer, all I see is nails": using Claude Code for random non-coding work, like dataset cleaning. It's really convenient to have a script spawning Haikus via `claude` CLI and feeding them prompts and JSON files. Money burn potential: practically unbounded, but also it's real work that the product people wanted done, so of course it has a cost associated with it. I'd be bewildered if anyone complained.
I’m basically doing lvl 3. There’s not a single port in my local worktree’s .env that’s not guaranteed to be unique across all worktrees. Skills for agent to start their own managed dev server, launch their own isolated instance of chrome etc. literally end-to-end code and debug the entire app. I do have to say though you have to know the app inside out and have to have a pretty well groomed backlog in order to run them all in parallel and actually benefit from it.
as a new user of agents, i am realizing i'm using a strategy basically identical to level 0. is the typical approach to just make a CLAUDE.md/AGENTS.md and start a new thread for each task or is it more complicated than that?
I spend about $3k/month (subsidized by the Claude Max plan).
I guess I fall under level 3 (2?): I typically have 3-6 agents working simultaneously on the same feature, they each make worktrees, code, run tests and put up PR’s. I also have Github actions which scan for regressions and security issues on each PR.
It makes my development cycle extremely fast: I request a feature and just look at Github and look for changes to my human readable outputs, settle on a PR, merge, repeat.
The issue is that I am now the bottleneck in my system. I find myself working basically non-stop, because there is always more to do. (Yes I know I can automate the acceptance criteria but that turns to slop real fast)
yeah, it is bad. Human brain is not able to properly assess this amount of changes. To understand even a small change you need a lot of capacity. To understand thousands of lines - impossible.
This is pure slop pouring into prod and we can see more and more consequences of this in all big corps's products - things start to break more and more exponentially faster.
First: There's the obvious "If the company is letting me do it, I'll be wasteful." This includes not clearing/compacting the context often. Opus now has a 1M context window, and quality is good to at least 200K. So each query is burning a lot of tokens until you clear/compact.
People have already mentioned the size/complexity of the codebase. I'm new to my team and the codebase isn't huge, but it's large enough that there are plenty of parts I have little understanding about. When I'm given a task, then yes, I definitely go to Claude and ask it to find the relevant parts of code so I can understand the existing workflow before even attempting to change it.
The downside is that I don't build expertise. But the reality is that with Claude, I can get the work done in 1 day that would take me 5 days of struggling, and if everyone is doing it, I can't be left behind. So I take the middle route - I get it done in 2-3 days instead of 1 so I can at least spend some time with the code.
Especially with AI, the rate at which code changes in our codebase is insane. So I built a tool that takes a pull request, and tells the LLM to go deep and explain to me what that pull request does. (Note: I'm not the reviewer, I just want to keep tabs on the work that is going on in the team).
And this is just the beginning. I haven't actually spent time to come up with more ways to use the LLM to help me.
My usage is similar to yours, but if I were fairly experienced with the code base, I'd do a lot more. I haven't asked, but I suspect there are people in my team who go over $1K/month.
As always, the bottleneck is proper testing and reviews.
Edit: I'll also add that for not-so-important code used within the company, I suspect most people are going full-AI with it. For my personal (non-work) code, I just let the AI code it all - the risk is usually very low (and problems are caught quickly). If someone is using the "superpowers" skill, then even for basic features you can burn lots of tokens. I usually start with 20-40K tokens and end up with 80-90K tokens when it's finished. Which means that many of the requests prior to completion were sending in close to 80K tokens. Multiply that with the number of queries, etc.
> This includes not clearing/compacting the context often. Opus now has a 1M context window, and quality is good to at least 200K. So each query is burning a lot of tokens until you clear/compact.
I see this repeated by others, including coworkers. It completely ignores caching. Caching itself is complicated, but the "longer context window = more expensive" is not 100% true and you are hampering yourself if you're not taking full advantage of large context windows.
> But the reality is that with Claude, I can get the work done in 1 day that would take me 5 days of struggling,
Is it really a 5x ROI? Where are all the apps, games, platforms, SAAS's, feature s that have been backlogged for 5 years that are all of a sudden getting done? Because I see a modest ROI, and an _awful lot_ of shovelware.
> First: There's the obvious "If the company is letting me do it, I'll be wasteful." This includes not clearing/compacting the context often. Opus now has a 1M context window, and quality is good to at least 200K. So each query is burning a lot of tokens until you clear/compact.
What is wasteful? If you are costing the organization $x/hr, and spend an hour saving the company $(x*0.5), you didn't save money, you wasted it.
To the company, are you spending more time being token efficient to save less money than they're paying you for the time? That's not even getting into opportunity costs.
There is some extreme wasteful spending of AI tokens out there. But trying to get below $3k/month in token costs is often of questionable value.
I have ancedotal examples of claude code choosing a solution to a problem that is ridiculously token inefficient.
One example - was giving several agents different sub problems to solve in a complex ML / forecasting problem. Each agent would write + run + read a jupyter notebook. This worked ok, the notebooks would be verbose but it was fine... until one of them wrote out hundreds of thousands of rows to a cell output, creating a 500MB ipynb file. Claude tried several times to read it and it used my entire context limit.
The solution was to prescribe a better structure of doing the world (via CLI analysis scripts + folders to save research results to). But this required some planning, thought, and design work by me the operator.
When I see people spending $10k a month in tokens, I can only assume they are taking lazy hands off approaches to solving problems with the expensive hammer that is claude code. EX: have claude read all your emails every day... the lazy solution is to simply do that, but a smarter solution is to first filter the email body HTML to remove the noise.
To be fair, I do that. 2-3 times a day, in fact. Not all of my emails (the archive has ballooned to several hundred thousand messages total), but the most recent ones certainly.
My standard prompt is along the lines of "go through the last N days of my emails, identify all threads that I need to know about, action on or follow up with". N is usually a number between 2 and 5. I've specified a standing of set of rules to easily know what is likely a source of noise to aid in skipping the bot spam.
The company is charged API pricing through an enterprise contract, and I remain persistently curious how much I burn. My daily admin-related token expenses appear to fluctuate between $1 and $5. For something that saves me up to 2h of time a day I consider that a rather tolerable deal. (When I dive in to code to do refactors or deep investigations, I can spend as much as $25 a day.)
If it’s very large, especially if the tool needs to refer to documentation for a lot of custom frameworks and APIs, you often end up needing very large context windows that burn through tokens faster.
If it’s smaller or sticks with common frameworks that the model was trained on, it’s able to do a lot more with smaller context windows and token usage is way lower.
The codebase and the topic you're working on are huge variables.
I don't use LLMs to write code (other than simple refactors and throwaway stuff) but I do use them heavily to crawl through big codebases and identify which files and functions I need to understand.
Some of the codebases I explore will burn through tokens at a rapid rate because there is so much complex code to get through. If I use the $20 Claude plan and Opus I can go through my entire 5-hour allocation in a single prompt exploring the codebase some times, and it's justified.
Other times I'm working on simple topics, even in a large codebase, and it will sip tokens because it only needs to walk a couple files to get to what it needs to answer my questions.
I'm currently in repos where the context window required is so large that the output is almost always "wrong" for the problem at hand. Quite a few people at my company burn through tokens this way, and it certainly isn't providing value to the company.
Begs the question if we should move on to minimal microservices so that whole project lives in context of llm. I hardly have to do anything when I'm working with small project with llm.
Yes, in a reasonable microservice land where the places you need to connect to are all documented in very concise places, you have have extremely productive $10 days. In the giant monorepo with everything custom, you can't just rely on built in knowledge of 80% of you libraries, so it's a very different world.
A place like Google has to be so much better off just training library concepts in, given how much of the things the LLM will "instinctively" reach for are unlikely to be available. Not unlike the acclimation period what happens when someone comes in or out of a company like that, and suddenly every library and infra tool you were used to are just not available. We need a lot more searching when that happens to us, and the LLM suffers from the same context issue. The human just has all of that trained in after a 6 months, but the LLM doesn't.
> I just can't figure how _how_ to burn that much money a month responsibly.
Same but in regards to quotas. I'm on the 200 EUR ChatGPT plan, so presumable have the highest quota, using the "most expensive" models, on highest reasoning, in fast-mode (1.5x quota usage) and after a full day of almost exclusively doing programming with agents, I still get nowhere close to hitting my quota.
In fact, since I started using agents for coding, the only time I even got close, was when I was doing cross-platform development with the same as above, but on three computers at the same time, then I almost hit my weekly quota. But normally, I get down to ~20% of the quota but almost never below that. I don't see how I could either, I'm already doing lots of prompts and queries "for fun" basically.
Codex quota is suspiciously high right now. Either way, the subscription plans are not sustainable, and perhaps less relevant to any discussion about corporate API use. The prosumer developer plans are an insane deal. It is a golden age right now and it will end. If you tried to use the APIs to achieve the same thing, you would be spending thousands upon thousands of dollars a month. My completely unfounded conjecture is that OpenAI is trying to grab developers back from Claude by burning $$$$.
I am running a bunch of autoresearch loops that optimize various compilers and its pretty easy to burn through as much money as you want if you have a measurable goal and good tests.
There are tools that let you extract out what the API price would be for a subscription plan use. I typically have monthly runs that are on the order of $2k - $4k at API prices, despite paying a mere $200/mo to Anthropic.
Edit: Just checked with ccusage and I've been doing about $450/day for the last week. A bit more than usual, but I still haven't come close to weekly limits and never hit the 5hr rate limit.
> Same but in regards to quotas. I'm on the 200 EUR ChatGPT plan,
The API rates and monthly plan rates are not the same.
If you're using enough to justify the 200EUR plan (instead of the 100EUR plan), your use might actually be as high as some of the API bills discussed above.
I'm on the same page. Do people not analyze the problems themselves? Are they just copy/pasting their entire ticket description into Claude Code and having it iterate until they land on something that works?
Not what I do. I'll reformulate the ticket description so that the purpose and as many details as possible about the solution are made clear from the start. Then I tell Opus to go and research the relevant parts of the codebase and what needs to be done, and write its findings to a research.md file. Then I'll review that file, bring answers to any open questions and hash out more details if any parts seem fuzzy. When the research is sound I'll ask Opus to produce a plan.md document that lists all the changes that need to be made as actionable steps (possibly broken into phases). Then I'll let Sonnet execute the steps one by one and quickly review the changes as we go along.
> Are they just copy/pasting their entire ticket description into Claude Code and having it iterate until they land on something that works?
"Their ticket" = that was AI generated.
After which they will wait their AI generated PR be checked by an automated AI QA that will validate against the AI generated spec.
It feels like important metric of "corporate AI adoption" should be how effective the human in steering the AI.
IF THE HUMAN ISN'T EFFECTIVE, THE HUMAN NEEDS TO GO.
If it manages to solve the working solutions - then it's great! why would you waste your time on it?
It it fails - then it's great! you find your value by solving the ticket, which can be a great example where human can still prevail to the AI (joke: AI companies might be interested to buy such examples)
(All assuming that your time cost is pricier than token spending. Totally different story if your wage is less than token cost)
Actually no. We ask business analysts to supply documentation for whole products. We use AI to analyze that documentation and after that we use AI to create tasks in Jira. Business analysts will review them.
After that we use AI to translate the tasks to a more technical view.
After that we use AI to implement the tasks.
After that we use AI to review the tasks.
After that a human QA tests the tasks.
If all is good, the code is merged and lands in production.
And yes, we burn a lot of tokens but the process is very fast. It takes months instead of years.
> Are they just copy/pasting their entire ticket description into Claude Code and having it iterate until they land on something that works?
There's also the pattern of creating an army of agents to solve problems. Human write a plan. One agent elaborates on it. Another reviews it and makes changes. Another splits it up into tasks and delegates out to multiple agents who make changes. Yet another agent reviews the changes, and on and on. All working around the clock.
If Uber is like most other companies, there's a leaderboard for AI tokens consumed. If maximizing your token usage is going to get you to the top of the leaderboard, and therefore promoted for "productivity", people are going to find creative ways to be "productive".
One thing that stands out it is it sounds like you're using LLMs for only one part of your process. You're having LLMs help you write code, but the code you're writing doesn't itself make use of LLMs.
My current job basically involves trying to improve processes that themselves make heavy use of LLMs. Once you have multiple agents in parallel running multiple experiments on improving the performance of primarily LLM driven tools it's not that hard to get your token usage pretty high.
Claude is a mediocre programmer that can do great things with great supervision, but it can't make mediocre human programmers into good ones, because they can't provide great supervision.
id bet its the LLM doom loop: vaguely ask it to do something, tab to news.ycombinator.com for 30 minutes, tab back, noticed it misunderstood the prompt. Restart with new improved prompt, tab back to HN.
So yeah, probably the same thing people do anyway, just not compile time its now generating time.
Several options on how to burn that amount of money without being specifically looking to tokenmaxx
- Agents that spawn other agents
- Telling agents to go look at the entire codebase or at a lot of documents constantly
- MCP/API use with a lot of noise
- Loops where the agent is running unattended.
I do think it's not really responsible use and a loop where the agent is trying to fix CI for one hour for something that would take you five minutes (for example) is absurd. But people do that.
One of the new dynamics is a loop between a "code review" LLM and a "fix LLM". It's super annoying because the code review LLM often finds more bugs on a follow-up review that were there from the beginning, but at least I can loop both until check go green.
I spend 400-500 dollars per day during active development at this point. However with more aggressive task breakdowns I can spend ~5k per day.
These spend rates are in part due to operating on a larger code base. Operating on a larger code base means more time searching and understanding the code, tests, test output. They are also due to going all-in on agentic coding.
It can feel painfully slow to go back to coding by hand when for a dollar you can build the same functionality in a minute. Now do this with multiple sessions and you can see where the cost goes.
I've been working on a project to build a new Postgres based database in Rust[0]. I'm four weeks in and have 93% of the Postgres test suite passing. I've found agents to have worked really well for this as I have an existing codebase that has good architecture that I can point my agents at. It's also easy to debug as I can diff what my agents are doing and what Postgres is doing.
I've had to get multiple codex accounts, but there was a brief period of time where I tried API usage to see how expensive it would be. In about an hour I spent $650 of credits. I had codex estimate how much I would be spending if I was doing pure API usage and it estimated around $10k/week.
For context Postgres is 1M lines of C code. It's looking like pgrust will come out as less lines of code than Postgres and at peak I was adding over 100k lines of code in a day. I would estimate it would take a team of 5 software engineers at least 3 years to get to where I got in a month with a couple Codex subscriptions.
There’s your problem. You’re trying to be responsible instead of trying to burn tokens so you can have your name on top of some leaderboard for most wasteful AI users.
I dont use automated agent workflows or anything, I just use clause as a pair programmer of sorts. A month or so ago I used claude Opus 4.6 for 2-4 hours on API pricing and racked up $20 in spend, which surprised me since that was much higher than my usual.
I dont know about $10,000, but i can see hitting $1,000 pretty easily if you aren't looking at the costs.
It turns out writing good prompts helps to keep token usage down as the model wastes fewer tokens discovering context it needs that wasn't hinted at in the prompt.
Whereas a good prompt will give solid leads to all the specifics needed to complete the task.
>I genuinely challenge someone spending $5-$10k a month to demonstrate how that turns into $50-$100k in value
At a lot of businesses $5-10k/mo of AI spend doesnt even translate into $5-10k/mo value. Churning out code was rarely the business value bottleneck. It was convenient for everybody else to blame developers not writing code fast enough for their failures. Now they have no excuse but I doubt will own up.
Multimedia feedback can burn much more than that. If I'm sending frames of 3D engine's output. I mean I would like to send it a video if I could but that is too expensive but I'm sure there's orgs out there that really do want every frame in a prompt doing something. This can be exponential depending on the application. I recently wrote a Milkdrop visualization analyzer. I could have sent thousands of frames for each one. I didn't but well I wish I could haha.
Yeah, I use Claude Code to do security reviews. For every CVE that Wiz flags, I have Claude Code check for reachability analysis.
I typically consume about $200/month doing this. Most of our engineers are in the $200-400 range, with a few people around $1,000.
But then there's one guy who's not only hitting $8,000, but supposedly has nearly 300,000 lines of code accepted (Note: This means he's accepted the lines of code from Claude, not that he's committed it). I can't figure out how.
Do lots of deep research and code reviews on large legacy codebases. I've created lots of documentation to reduce token consumption but it's still a lot of token consumption.
The answer may be agentic loops that keeps cycling through the same problem again and again until they land on a non-erroneous outcome. Some people boast having multiple such agents working in parallel on different problems, tending to one while another is processing, perhaps not unlike the movie mad scientist who runs around the lab throwing switches while laughing maniacally at the prospect of his impending success.
There was a tool posted called codeburn that showed a breakdown of what activity your usage was spent on. Mine was almost all coding but other people in the thread said >50% of their usage was conversation. I’m inclined to agree with you that someone who is reasonable with their compute usage is likely to be thinking things through rather than just burning tokens to get an LLM to solve the problem
In addition to what folks are saying here about larger code bases and multiple features at once, there’s also the time requirement to be efficient. It takes time to be more efficient with token usage and it may not be worth it for some of these companies so… burn away until we start to get more data and then we’ll check in.
> I just can't figure how _how_ to burn that much money a month responsibly.
I always have a few agents (2-5) doing research and working on plans in parallel. A plan is a thorough and unambiguous document describing the process to implement some feature. It contains goals, non-goals, data models, access patterns, explicit semantics, migrations, phasing, requirements, acceptance criteria, phased and final. Plans often require speculative work to formulate. Plans take hours to days to a couple of weeks to write. Humans may review the plans or derived RFCs. Chiefly AI reviews the code (multiple agents with differing prompts until a fixed point is reached between them). Tests and formal methods are meant to do heavy lifting.
In my highest volume weeks, I ship low hundreds of thousands of lines of software not counting changes to deps.
> At a corporate level, I'd much rather hire a junior engineer
Any formulation of problem sufficient for a truly junior engineer to execute is better given to an agent. The solution is cheaper, faster, and likely better. If the later doesn't hold, 10 independent solutions are still cheaper and faster than a junior engineer.
There is no longer any likely path to teaching a junior engineer the trade.
I am sorry, I am probably just very dumb, but this sounds extremely wasteful. If this is a reflection of how software was made before AI I wonder how anything was ever made.
> In my highest volume weeks, I ship low hundreds of thousands of lines of software not counting changes to deps.
I suspicious you actually get claude to output that much usable code in a week, but maybe you do.
But I’m 100% positive that you’re not shipping even a small fraction of the amount of value that someone reading this 2 years ago would have expected from hundreds of thousands of lines of code.
> In my highest volume weeks, I ship low hundreds of thousands of lines of software not counting changes to deps
But what do they actually do?
I keep seeing people wax poetic about the mountains and mountains of code that LLMs are dumping out but I'm yet to anywhere near a proportionate amount of actually useful new apps or features. And if anything the useful ones I do find are just more shovels for more AI. When do we get to the part where we start seeing the 10x gains from the billions of lines of code that have probably been generated at this point?
I dunno I've seen agents make boneheaded mistakes even a junior engineer wouldn't make. Treating them as strictly better than junior engineers is a problem, not just for that reason but because you're effectively killing the pipline for senior engineers. Then what?
On the OpenAI side, GPT-5.5 generates spend at a prolific rate that's even faster if you use it through an ACP connection in a tool like Zed. I used to never think about Codex rate limits and now I'm hitting mine every 5 hour block and spending ~$100/day on top of that in adhoc credit purchases.
I use it as an ide. I am a security engineer but there a bunch of predictable things I need to write code for. Onboarding logs, writing detection rules, SOAR type stuff. It makes a diff and locally tests all the permutations I describe than I review the code.
I don't think it's about value. Tokenmaxxing is a thing now since that one CEO said he wants his $250k/yr devs to use $400-$500k/yr in tokens, so now it's all about how many agents can you have running concurrent tasks all day long.
In our org it's people that have too much stuff in their context, every mcp in the world installed, GTD, PAI, OpenClaw. I'm equally baffled how one can spend that much money during their day to day.
I also don't think a lot of people know some of the more advanced context management tricks like /rewind /fork /tree to take advantage of prefix caching
I could argue in all the ways my personal experience disagree, but lets just Occam's razor:
Most people agree big orgs regularly have dysfunctional incentives. We've seen it happen a thousand times.
Your suggestion requires we also assume a 10x faster delivery time by people spending 200$ vs 1000$ - something I've yet to witness or hear a credible account of.
So while that might be true in a small number of cases, in general its foolish to go with the "10x delivery speed" hypothesis.
It turns into 50k to 100k or more of value for the employee the moment upper management made AI spend a personal performance target across most corporations.
They keep forgetting to put "make no mistakes", "think deeply" and "get it right the first time" in their prompts.
When people have no ability to understand what they are doing, they will just rerun it endlessly hoping they get something passable. When that doesn't happen they burn money.
I doubt most of this is from rerunning the same prompts over and over. This token burn is more likely from people using swarms of agents and orchestrators for “efficiency”.
“I’ve got 2 dozen agents churning through the backlog to build this feature that would take one agent an hour to implement.”
That would be true in a sane world with investors who value profitability. But everything is now focused on DAU and the network effect. Overusing their services might actually make them look better to investors who shovel more money to them to light on fire.
Idk about Uber in particular, but aside from legit programmers using AI to help them do legit work faster, there are people spamming it for metrics. And the hiring pipeline has gotten screwed up somehow, like half the people who reached the onsite interview for a technical role lied about all their technical skills, or they didn't lie and manage to pass hiring but then only take the tasks that AI can solo. And if it can't, waste tokens until giving up.
It really depends on the way you use AI. If you just prompt it for a task and either accept or reject the output, you won't spend much.
But if you are like me, you aggressively document and brainstorm before planning, you review that documentation with subagents, make modifications, you aggressively plan, you verify that plan with subagents,make modifications, have a large number of phases, planning again for each phase, writing tests to cover 100%, implement each phase, do intermediate and final code reviews with subagents, apply fixes, write final documentation and do all these in parallel, if you have multiple tabs in your terminal each running Claude Code for 10-12 hours a day, then $5000 per day is not much.
If you use Anthropic or Open AI subscription and you spend $1000 per month, you are not using AI much.
I spent $24,096.47 in "API" costs with my $200 Claude Code Max subscription in April.
I'm building my own saas. I spent 6 months writing the code by hand before using Claude, and that was fine, but its much faster to give the exact specs to Claude and have 3-4 sessions working in parallel with me. When you validate changes with exact test specs there's much less correction you need to do. I always hit my weekly limit and it's far cheaper for me to use this than to hire someone and spend time onboarding them.
Slop architecture leads to compounding problems that people try to solve with more slop. If one wants to control the quality of the code then the throughput and multithreading is bottlenecked by how much code one can comprehend in a given period of time.
I used CC frequently for development, Opus 4.7 with high thinking, with a $100 Max subscription, and haven't been rate limited yet. IMO a subscription is the way to go as it puts a ceiling on spending.
My observation is - pasting long documents is a great way to burn tokens. Turn based conversation, even a very deep and technical one, consumes less tokens than "read these logs and tell me where the problem is". Ironically, the log reading example is a perfect use for a local LLM.
You are probably guiding them step by step and reading the results. Maybe you also sit and wait for the results.
Agents can iterate on a problem for hours if they can see their results and be given a higher level goal to evaluate their progress toward.
When you have an agent working for minutes or hours, never wait on it. Use that time to spin up another agent.
You can also spin up several agents in parallel to attempt the same item of work and compare their results to choose which to work off for next steps, instead of rolling the dice on a single option at a time and gambling that it's better to refine that first attempt instead of retrying from the start several more times.
And if you are doing manual QA manually, you're missing out on having e.g. Codex's "Computer Use" or "Browser Use" automate your manual verification steps and collecting a report for you to review more quickly. Codex can control multiple virtual cursors simultaneously in the background without stealing focus, to parallelize this.
If you want to use up more tokens to get more done (though more outside of your control and ability to review of course), that's how.
I'm working on some serious data analysis + realtime async code, and I use 200-400 million tokens a day with Claude Code alone (via ccusage). The complexity of the code seems to have a big impact on the number of tokens used. On simpler projects I use many fewer tokens.
My programming endurance is much greater now (2-3x focused hours per day), my productivity per hour is multiples higher, and I code seven days a week now because it's really exciting.
All told, I would pay for these tools as much as I would pay for full-time human programmer(s).
> I'd much rather hire a junior engineer who spends $100-$200/month
I'd much rather hire a junior engineer at $1.20/hour too! Can you hook me up with your contract services provider?
Obviously I know you're talking about AI costs only. But the idea of doing that analysis without looking at the salary of the person running the tool seems to be completely missing the point.
Now, sure, there are legitimate arguments to be made about efficacy and efficiency and sustainability and best practices. But, no, $100k/year absolutely doesn't need to be "justified" if it works. That's cheaper than the alternative, and markedly so.
In your fictional world you hire a junior who will write code manually, right?
First , I interview people, Junior skills in manual coding dropped sharply this year. These are people who started they school manual and switched mid-course. In two years there will be no such people.
well, that will never happened anymore in this world unless we will go back to caves, especially for juniors. Junior that writes good code is already a dying unicorn.
The outcome will be ... you will hire a junior ... who will burn more tokens, and chances of mistakes with less expensive model, less tokens are even higher.
I mean even the normal people we get in interviews have no clue, like 80% are just ignorant.
I stoped an interview after 5 minutes: when i asked what ls -ahl is doing, he started telling me how he vibe/ai codes stuff and thats his workflow. Okay if you don't know the basics, guess what? everyone can replace you or at least i'm not hiring you (i only told him thats not what we are looking for and thanked him)
The fully loaded cost of a senior engineer is already well past 400k. +5k a month is not that much if it helps them be XX% more productive.
Personally at a different big tech I'm in the mid 4 digits AI spend per month and it helps me a lot, basically all coding has been trivialized and I work on an extremely large codebase. I'm spending more time on things closer to direct value generation like data analysis and experiment tweaking rather than spending time moving a variable across 10 layers of abstraction and making sure code compiles.
Yes, my thoughts exactly. Productivity by definition creates things, hopefully valuable things. Is all the extra burn on chatbots worth the cost? Has Uber somehow gotten dramatically more efficient and effective due to this massive budget overrun? Or have they just given people shiny and expensive ways to push the same work around?
> If it was actually productive, then the revenue would increase and affordability wouldn't be a question.
Revenue has increased. Have you seen Meta's latest earnings? +33% revenue - in this economy.
Affordability is not a question. There is a reason companies like Meta have no issue with their engineers spending $1k/day on tokens. It's just not that much compared to how much they make per employee.
That means absolutely nothing in the context of this conversation. It says right in their release ad impressions are up almost 20% and cost per add is up 12%. Those two metrics alone account for most of the increase in their revenue. Absolutely no conclusion can be drawn regarding the impact of AI on those numbers one way or the other.
It's not like they used AI to crank out some new revenue generating piece of software, or massively reduce operating costs. In fact their operating costs rose by 35%.
I'd argue it's often the contrary -- since it's easy to ship features and fixes, people often ship things without questioning if it makes business sense to support a use case, or if the design is solid. Now you have exactly the same revenge but more things to maintain
This is my thought too. The eggheads in accounting set budgets, and we produce products within that budget. I could be twice as productive with twice as many people, and maybe 50% more productive with good AI, but if it's not budgeted for it's an issue (especially short-term before the product is released).
That is not true at all. No matter how "productive" a company is means nothing if people aren't buying your product. And using LLMs to be more productive will not convince anyone to buy your product. Human creativity and intuition to make a product that people want to use is what sells. Productivity for productivity's sake doesn't really move the needle at all, and can make things worse.
It's actually incredible the extent to which non devs imposing KPIs on devs underestimate how badly this will get gamed, whether it's AIs, PR/line counting or whatever.
Gaming is one thing, fundamentally not understanding how engineering works will lead to shittier outcomes and cost the company in ways the management will never understand.
Management in the age of AI is falling for the doorman fallacy wrt engineering. If lines of code were the most valuable aspect of software engineering, my front end JavaScript intern would’ve been the most valuable person in the company. https://www.jaakkoj.com/concepts/doorman-fallacy
Exactly. At Cerebras I know of several people who burn tokens on completely USELESS tasks (randomly changing pixels in an image) just to keep them high up on the token leaderboard.
I suspect the other tokenboard leaders are doing the same. They made the metric "token usage" (which is just a proxy for LOC) so that's what they're gonna get.
I don't understand this critique.
(1) Did you previously think you weren't getting paid for doing what a company wants you to do, aka what THEY thought was productive?
(2) Do you think all this AI generated code is useless?
I think the point was that, when you make a metric goal of "you must use AI this much", then people will use AI even in ways that isn't adding to productivity.
To answer your second question: Yes, much of it is worse than useless. The tools need guidance to produce useful output. If you use it poorly, you will get garbage output that may do more harm than good.
And your response does not address the point being made in the comment you replied to: Many people are being evaluated by how many tokens they burn, which is about as good a metric as lines of code written.
1) I think if the company I work for spends too much effort on things that aren't going to make money, they won't be able to pay me anymore, no matter what they "think" is productive. That's not how executives at companies like this make decisions, though.
I think parent is saying "% of code being generated by AI" is not a generally good, direct metric for business value. It's akin to the "we are pushing SO MUCH CODE" phase of early ai marketing.
If we're trying to measure the value of adopting tool, it's probably better to measure the ROI of that tool rather than the usage % of that tool, especially when usage is basically mandated.
To directly answer your questions:
1. You're being paid to create value for the business, which "doing what they think is productive" is a proxy for. You're not being paid to use a tool a high % of the time.
2. I doesn't seem like parent even commented on the quality of the code generated. I think anyone that uses it regularly can agree that: a) the code is not useless and
b) all generated code is not immediately production ready c
) AI generation of code is an accelerant for software development
Goodhart's Law isn't a problem immediately. If you want more code to be written, and the only feasible way to write it to goals is to heavily use AI, then you might run into the problems of AI-generated code, and an infrastructure that's poorly architected and much less understood than it would've been ten years ago.
1. At my level, the company is not just paying me to do a task the way they want it done, they are paying for my experience to orchestrate the best way to do it. They want an outcome, and I'm responsible for figuring out how to get to that outcome with the right balance of cost, correctness, etc. But yes, the most dystopian reality is what you said.
2. It's not useless, but the AI generated code is absolutely lower quality than what I would have written myself, but there is no desire to clean it up. Companies have always had a disastrously bad understanding of technical debt and they finally have tool they can shove down developers throats that trades even more velocity for even less quality. They're going to take that trade every single time.
> (1) ...getting paid for doing what a company wants you to do...?
At my previous company, when the thing they thought they wanted me to do (which was not the thing they actually wanted... but whatever) diverged from my values I quit. You can just do things.
> (2) Do you think all this AI generated code is useless?
Almost universally, yes. Especially in organizations that historically haven't been particularly careful about hiring and have a huge number of young, inexperienced people. There are exceptions but they're rare enough that throwing that particular baby out with the bathwater isn't a big loss.
you're missing their point; LLM use is often a part of your evaluation at some of these larger companies and they expect you to use them heavily or you will get a lashing
GP just saying that any metric will be gamed and if you have some costs that is associated to that, it will grow. Let’s say you set some metric that says the most productive dev are the ones that has the most files changes, you can soon expect every function and structure to be its own file. Same if you say that sales commision are based on how much time you spend calling, expect the phone bills to grow a lot.
I love how these articles drop, and all of a sudden HN is filled with people who think engineering productivity is simple to measure.
Yes, productivity implies revenue (or cost reduction), and revenue is measurable.
However:
1. You spend money today to build features that drive revenue in the future, so when expenses go up rapidly today, you don’t yet have the revenue to measure.
2. It’s inherently a counterfactual consideration: you have these features completed today, using AI. You’re profitable/unprofitable. So AI is productive/unproductive, right? No. You have to estimate what you would’ve gotten done without AI, and how much revenue you would’ve had then.
3. Business is often a Red Queen’s race. If you don’t make improvements, it’s often the case that you’ll lose revenue, as competitors take advantage.
4. Most likely, AI use is a mixture of working on things that matter and people throwing shit against the wall “because it’s easy now.” Actually measuring the potential productivity improvements means figuring out how to keep the first category and avoid the second.
This isn’t me arguing for or against AI. It’s just me telling you not to be lazy and say “if it were productive you’d be able to measure it.”
> HN is filled with people who think engineering productivity is simple to measure.
I think the prevailing (correct) consensus is that developer productivity is actually very hard to measure, and every time it is attempted the measure is immediately made a target making the whole thing pointless even if it had been a solid measurement- which it wasn't.
IDK where you're getting the idea here that measuring productivity of anyone who isn't a factory worker is easy.
Is it easy to measure a factory worker's productivity? It would seem surprising and interesting if every job's productivity is hard to measure except for one particular kind.
This is the message that somehow the tech industry is constitutionally incapable of absorbing. The "innovation impulse" is cancer. I have no idea why tech managers keep harping on about "innovating", it's so bizarre.
I mean, the option is not zero productivity or some productivity: it could be negative.
We doubt the productivity because we have enough experience with Claude Code to know that flooding your organization with that many tokens isn't just unproductive, it's actively harmful.
Minor shifts in productivity are hard to measure. Major jumps in productivity would be obvious. I think it’s clear that, if AI is affecting productivity, it’s to a minor degree at best.
If it were 10x productive you'd be able to measure it indirectly, you'd be unable to avoid measuring it. So the initial claims were clearly lies. The research question is:
Is it >1.0x productive?
I agree that's very hard to measure. But given what this shit costs, it had better be answerable, and the multiple had better justify the cost.
> figuring out if the company can afford this level of productivity at scale
This is the thing that boggles my mind. They spent their budget. They have 4 months of data. What do they have to show for it?
I'm not a hater; I'm not a luddite. I have a $200 Max plan and I use it.
But are you saying that Uber made this tool available, urged everybody to use it, and is confused about what happens when it worked? It's one thing if they decide AI isn't productive enough to be worth the cost.
Are they out of ideas on what to build next, or something?
The personal max and teams plan actually are an amazing bargain compared to the API PAYG cost you get with Enterprise. I guess they really need their Enterprise features though, otherwise they could just tell users to expense a $200 max sub. Enterprises gonna Enterprise.
My guess is nothing you can see right now, since it likely takes a lot longer for any substantial external-facing changes to roll out broadly. Internally I'm sure several features have moved faster. I've noticed this at Salesforce where it certainly seems like things that would have taken a few weeks take a few days now. This doesn't translate directly to more money, just more potential to make money.
> Are they out of ideas on what to build next, or something?
Well, what is there for Uber to build next? They have their ride hailing platform. It works. They have adapted it for other kinds of delivery (food, groceries, "anything that fits in a car") What else is there in the "someone driving a car" space for them?
> I'm not a hater; I'm not a luddite. I have a $200 Max plan and I use it.
I'm glad to see we've reached the point of AI discourse at which anything that might be construed as criticism must be prefixed by "I'm also part of the cult, I'm not a non-believer, but" to avoid being dismissed as a heretic.
Speaking as someone who's bootstrapping here, I'm often envious of engineers at these larger companies, but I also worry that the incentives are screwed up.
If I were an engineer at Uber, why wouldn't I select gpt 5.5 pro @ very high thinking + fast mode for a prompt? There's no incentive not to use the most powerful (and thus most expensive) model for even the smallest of changes.
I tried one of these prompts for some tests I'm doing for image->html conversion, and a single prompt cost me $40. For someone that's paying that themselves, I'd pretty much never use this configuration. For someone at a large company where someone else is footing the bill, I'd spin these up regularly (the output was significantly better, fwiw). For engineers they're being rated on what they deliver, not the expenditure to get there.
There are ways to do this cheaply, but there are no incentives for engineers to do so.
SWE's are expensive; median salary is $133k (not counting health insurance, payroll taxes, etc). If you can shave off an hour of dev time with $40 in LLM credits, that's $26.50 cheaper than having them do it without.
I'm not entirely convinced it works out that way so far, but that's the theory.
Trying to bring down LLM costs is sort of a double-edged sword, because the dev needs to be cutting LLM costs by more than what you're paying them. If it takes them a day to bring costs down by $1 an invocation, then it takes almost 2 years to recoup the salary costs. It's worse because LLMs currently change so much I wouldn't be confident that their solution won't be broken before the 2 year period. Will we still be tool calling in 2 years, or will that be something new? Will thinking still be a thing, or will it be superceded by something else? I don't think anyone knows, even the frontier providers.
How could they implement it? Try testing a bunch of models (closed and open sourced) and then seeing which one gives the best returns for it's cost? And then how do they check if it's being properly used, I have read of people just throwing their token budgets to the fire so that they show high usage for KPIs, while the most obvious cases of "X do this very wasteful thing" will be culled quickly (hopefully), I don't see how non-technical management can see through the thinnest layer of malicious compliance
According to [1], there are about 5500 people in Engineering at Uber. Using $1250 as the mid-point of the $ spend range, that comes to about $6.8 Million in engineering AI spend, ballpark, with the range being $2.75 Million - $12 Million. The article lists $3.4 Billion as the R&D spend.
The AI spend does not appear to be a significant chunk of R&D spending (0.3% in 4 months or 1% annualized). If they didn't plan for it, sure, it's not peanuts in the budget, but in context not that much.
The real question is, what did they get for that amount? The article claims that 70% of the code commit is now AI-generated, so presumably the code passed review and tests. Did it accelerate the feature count? did it reduce quality problems? Did it lead to other benefits?
Sadly the article is silent on the outcomes, besides the higher spend.
Maybe 4 months is too soon to assess the benefits. On the other hand, in an agile world ...
The actual source https://www.theinformation.com/newsletters/applied-ai/uber-c... says "about 11% of real, live updates to the code in its backend systems are being written by AI agents built primarily with Claude Code, up from just a fraction of a percent three months ago" and "He wouldn’t disclose exact figures of the company’s software budget or what it spends on AI coding tools."
> that comes to about $6.8 Million in engineering AI spend
That would be per month. Per year it would be $81.6M.
A small fraction compared to the R&D budget but still a huge amount of cash to spend on something with (apparently) very little impact on the whole business.
I think as it becomes more common for executives to think we can replace software engineering with agents, I wonder if they might be basing their decisions off of unrealistic perceptions of the average software engineer. I guess I'm mulling two somewhat contradictory senses:
1. You get out of it what you put into it. A savvy CTO might be incredibly excited by everything they can do with agents, and improperly think that all the software engineers can do the same thing, when in reality your org's average software engineers might not have the creativity to even think of many cases where it could save them work. So by mandating agent usage, you might find that productivity hasn't improved while AI costs have increased.
2. When using AI, there are two gaps that become more obvious. First is the gap of: who tells the agent what to do? In many orgs, product isn't technically savvy enough to come up with a detailed spec/plan that LLM can use. And many cog-in-machine developers aren't positioned to come up with the spec, they just want to implement it. By expecting work to be implemented by agent-using developers, you might instead find a lot of idle workers waiting for work to show up. Second is the qa/review cycle. You've introduced a big change to the org but are you really saving cost or shifting it?
I'm all for introducing LLM as optional to help existing developers increase velocity and quality, but I think the "let's restructure the org" movement is really dicey, especially for mid-size or smaller employers.
Beyond that, it's a force multiplier and it doesn't care if the force is positive or negative. Someone with poor software engineering principals can use AI to make an absolute mess quickly.
Related to 2, my company is strongly pushing for developer to have product mentality and be less of just a cog in a machine.
I am biased because I have more of a product mentality than other developers, but I think these are the people better positioned to be more productive with agents: know enough tech to be able to implement things with agents, and know enough product to know what should be implemented.
This is a really underrated comment. It’s a great question and speaks volumes as to what the hell so many modern tech cos are actually doing with all their resource. Didn’t Elon strip most of the team at Twitter away, after some awful false starts, it pretty much ran fine on about 80% less human resource?
1) the minimum number of employees it takes to maintain the core product
Vs 2) All the employees that it makes sense to hire for revenue and market expansion.
Internet comments usually assume that (1) is the goal. But think of say the sales department. If every salesperson you hire brings in new company revenue that’s greater than their salary + overhead, then why not hire 1000 of them?
It's very easy to blow through hundreds of dollars a session using API tokens especially with the 1m context if you aren't careful about clearing old context.
At the same time the subscription will allow the same usage for hundreds of dollars a month.
Either Anthropic is absolutely hosing API users, massively subsidizing subscriptions, or a little bit of both.
"Cursor estimated last year that a $200-per-month Claude Code subscription could use up to $2,000 in compute, suggesting significant subsidization by Anthropic. Today, that subsidization appears to be even more aggressive, with that $200 plan able to consume about $5,000 in compute"
Really curious how many people actually get close to that level of usage? Their general business plan only offers the $100 version, with pay-as-you-go above that.
If 95% of people are using $100 of value a month, the whales may not be hurting them that badly.
That’s based on Anthropic’s retail price right? Not a fair comparison, like saying that Netflix must be losing money because every movie rental is $4 and a Netflix subscriber can watch 20 movies in a month.
Anthropic has a very "interesting" business model where you get subscription pricing as long as you are under 150 employees. When you hit 151, you have to start paying API prices overnight for everyone, and your total bill instantly multiplies.
They are getting you hooked on cheaper tokens, then raking you in when you get scale. I'm sure Uber gets a break on list price, but I doubt they are anywhere near <150 employee subscription pricing.
Is that known to be true? Enterprise pricing is opaque. I am aware of at least one 151+ organization using a flat-cost $150-200/mo per seat Claude Premium contract. Reportedly most employees don't need to top up with additional API usage to cope with token limits.
Yeah, it's basically the opposite of how "product-led growth" SaaS works. Generally pay-as-you-go pricing is expensive at scale, but attractive initially. So you start on a pay-as-you-go plan, but as you scale you end up transitioning off pay-as-you-go to a negotiated commit. I.e. you call sales and sign a contract. Anthropic basically flips that around backwards.
I evaluated the pricing and could not justify the jump to Enterprise from Team. You lose the monthly subscription entirely when you jump to enterprise so you lose your ability to control costs.
You can cap per user, but not having the rolling cap are you really just going to tell a member of your team “No AI for the rest of the month”
I've been able to get by with the $20pm Pro subscription and reap great value out of Claude Code.
I feel like it really is about:
- Don't feed it the works of Shakespeare into the context window if all it's working on is a few files. I actually don't have a Claude.md file in my projects.
- I write the prompt as if I was giving instructions to another developer or to myself on how I want to approach a specific coding, with a numbered step plan. I've actually been able to take the details written into a Jira ticket on a work project, feed it into Clade Code, and get really good results from it.
- If you are responsible for the output, then you need to review the output - that does put a natural constraint on the tool's usage, but ultimately it is you who uses the tool, not the other way around.
I feel like that's the thing - you have to find the right cadence, just like with running or driving a car - you need to find the level at which you control the car, at which you maintain a consistent pace, and at which you get code that does what you need it to do and meets the quality threshold you want.
Have we reached a point yet where companies are spending millions a year on software licenses, cloud and AI to the point where the return isn't worth it?
Years ago I did work for a company that was spending over a million on Oracle product licenses and I was part of the consultant team they hired to rip it all out and just go for simple maintainable code based on open source products. Not only did it transform into a codebase that the average newly hired developer could maintain, you also had the savings of not paying Oracle a significant portion of your revenue.
I feel like that will repeat itself in a few years time with the current cloud and AI train everyone is on.
I haven't been in a professional setting for a while, I just code for fun nowadays so perhaps I'm somewhat out of the loop.
It's obvious that the word productivity has been used in this discussion to mean something other than the plain meaning of the word. If AI was productive, there would be no question about whether it could be afforded. If you're asking whether you can afford it then it isn't productive by definition.
They are using it to mean a mechanism that produces prodigious amounts of toxic waste. That does not conform to the historical understanding of the word.
> Monthly API costs per engineer ranged from $500 to $2,000 as adoption skyrocketed across the company.
That's...not exactly a lot per engineer. It sounds like they just didn't budget correctly. Especially if the net of that work is more features that would have otherwise required hiring more engineers, which would cost a lot more than $500 to $2000 a month.
No, it's really not a lot at all, especially if you've got a mandate to maximize your AI usage, which many engineering orgs have right now. I burned $216 USD using Claude Code in March just doing some casual development on the side and certainly not as a part of any professional workplace mandate.
Tokenmaxxing seems more and more like a way to encourage experimentation and learning, and incidents like this are a part of learning. Like, today devs simply use the most expensive model by default, even to do extremely simple things. This is obviously wasteful and costly, and budgets will soon be imposed, but this is how they're figuring out the economics.
For instance, like we estimate story points, we may estimate token budgets. At that point, why waste time and money invoking a model for a simple refactor when you could do it with a few keystrokes in an IDE? And why use a frontier model when an open-source local model could spit out that throwaway script? Local models can be tokenmaxxed, but frontier models will still be needed and will be used judiciously. Those are essentially trade-offs, and will eventually be empirically driven, which is what engineering is largely about.
So economics will soon push engineers back to do what they're paid to do: engineering. Just that it will look very different compared to what we're used to.
Yeah, and I expect estimating token budgets is going to go the same trajectory (along with the same accompanying annoyances) as estimating and tracking story points!
But done with the right mindset and proper awareness of the inherent uncertainty, you can sometimes achieve some reasonable estimates over time by starting with some T-shirt size estimates and then adapting based on actual numbers. Soon enough the team gets a sense of the nuances of the projects and its dependencies, and estimates get more accurate.
As such, the example of estimating CPU cycles for tasks is actually relevant. For instance it is a common practice in real-time embedded systems running on tiny micro-controllers. But it is also possible to get good estimates for more complex applications / OS's / architectures simply by benchmarking them over time.
The most common problem with planning and task estimation is that the corporate dynamics around it are not healthy: leadership often uses those as an SLA instead of the SWAG that they are. I worked on a team where our estimates never matched the actual time taken, partially due to rather unpredictable dependencies and high-priority tasks frequently interrupting us. But because we were clearly very high-functioning, management never held that against us. Those were some healthy corporate dynamics; not all places have that.
While this is a fundamentally stupid story to begin with, it was at least reported somewhat better in other venues. The original report came from The Information, and at least this Yahoo Finance[0] writeup mentioned that. This article has very little content and no sourcing.
Bizarrely I feel like that reflects how a lot of tech leadership are viewing it? I can't explain this behavior but this is the first time I've seen this inversion: leaders believing money spent on something is itself value. I have dev friends who are legitimately under an edict to burn more tokens! It's freakish.
I think the tech industry in general is taking advantage of the fact that software productivity is hard to quantify to say whatever they want about their AI productivity gains. Apparently we are past the point of having to justify anything and can just equivocate increased AI spend with success.
If they burned through their ML budget in four months while using heavily subsidized models, we're going to see companies burn through their ML budgets in less than a week once those subsidies are no longer in place and they have to pay per tokens used.....
I didn't see the article mentioning the outcomes achieved because of using AI compared to not using AI. I might be missing it. Mainly, Uber is a business. So profit & loss - both need to be measured to understand the equation.
AI might not make engineering cheaper — just more elastic.
Instead of paying for engineers, you’re effectively paying per unit of thinking.
At scale, that could get very expensive very quickly.
I spend $20/month on Gemini Pro and it greatly increased my productivity. I'm still in charge and only use AI for the more tedious or toughest problems. I can't see how these people could be spending this much productively.
I don't know, maybe this will make companies see the actual value in their engineering team. In my company they are starting to see the rotten fruits of the AI push, but it's come at the cost of many jobs, little planning and big ideas.
Exactly how Anthropic, OpenAI and co are selling it.
In the Uber Eats app I can't even request a refund for an incorrect order anymore, because the UI doesn't allow me to scroll down to the "submit" button.
It's been like this for months. I finally got my explanation.
I didn't see a bit where they said how this transformed into more productivity and more profit? What is the point in using AI to make developers more productive if you don't either have more features coded making more money, or fewer developers saving cost?
Not surprising, hit my 5h limit on Claude Code Max Plan, had some credits so switched to extended (api). 40 minutes later $30 credits gone... so yeah, I can see how this can happen.
I use a cli tool to build a document of all relevant code and then use ChatGPT 5.5 pro to plan a feature and generate an implementation plan, and then review and edit and paste it into codex on high to implement.
And it works because it won’t stop until the rust compiles. But the code is garbage and makes bad decisions that no junior would. Unmaintainable junk and sometimes I spend more time refactoring than if I would of just built it myself.
People here talking about generating 100ks LoC a month and I’m wondering if it’s a skill issue with me, or Codex or if I should pull all my investments out of companies heavily invested in AI like uber.
AI coding tools probably need the same boring governance as cloud spend: budgets, alerts, team-level visibility, and a way to spot runaway usage before finance notices.
we run an agentic pipeline in a different domain (data sourcing) and the only
way the math works is to be ruthless about which stages actually need which model.
As a founder, the question I always have is "what is the marginal value per token relative to engineer-hours saved." More of a gut feel at the moment, but would be great to calculate.
this is pointless without knowing what they are measuring. you could genuinely moving faster or you could be optimizing for engineers in a rat race to push more code because all their peers are now doing it because those are the metrics you are measuring for "ai productivity".
Imagine making your product compliant across 100+ countries while regulatiions, labor-laws, tax rules, insurance requirements, and data privacy laws keep changing.
Imagine itegrating dozens of payment methods - many of them highly localized - across emerging and developed markets, while dealing with fraud, chargebacks, KYC, AML, and settlement complexities.
Imagine processing trillions of data points every day - rides, location updates, pricing signals, ETAs, traffic conditions, demand forecasts, payments, support events.... storing it efficiently, querying it in near real time, generating reports, and keeping the whole pipeline reliable. I have woorked in data engineering, and can tell you confidently that this alone requires an enormous R&d budget.
Then there are the apps - not just customer-facing, but driver-facing, courier-facing, merchant-facing, fleet-management, onboarding, support, operations, compliance, finance, and hundreds of internal tools and dashboards.
Then come the integrations. Companies running at Uber's scale genemrally have hundreds of tjese - mapping providers, payment processors, banks, identity verification, tax systems, telecoms, customer support platforms, fraud detection, analytics, ERP, CRM, and more.
... And then there are even more...
Real-time routing and dispatch optimization
Dynamic pricing and marketplace balancing
Fraud detection and account security
Driver/rider safety systems
ML models for ETA, demand forecasting, incentives, and churn prevention
Experimentation infrastructure for thousands of A/B tests
Reliability engineering across globally distributed systems
Data centers / cloud optimization at massive scale
Localization across languages, currencies, addresses, and cultural norms
Customer support automation at global scale
Autonomous vehicle research, mapping, and computer vision
... to be fair, this is all what I could thing of based on my own work experience in related fields... there is definitely as many more systems in reality as mentioned abpve.
This continues to boggle my mind so hopefully somebody can explain how this is happening.
I’ve been using all these tools since they started popping out around 2021 personally and professionally. I probably built four or five products at this point with assistance, not to mention the thousands and thousands of back-and-forth conversations for research or search or rubber ducking or whatever.
I have never spent more than whatever the professional max plan is that is consistently $20 a month.
I asked a friend of mine who spent a couple hundred dollars in like an few hours how they did it. The answer was they basically getting these agent groups of agents stuck in a loop and they’re constantly just generating verbose bullshit that is not even interrogated and doesn’t come out with any artifact that is inspectable no matter how expert you are.
The couple of stories I have heard of these massive crazy spends are people literally just assuming these things can complete an entire human task in one shot, so they continue to hit the “spin the wheel” button until they get something closer to what they want
But I’ve yet to see that actually work
and it actually flies in the face of every instruction guide or documentation or prompt engineering process that has been described over the last almost 5 years
Most people don't have the team and time to do heavy token efficiency engineering. But that's all we do. marketplace.neurometric.ai has a bunch of task specific small models, and we charge flat monthly fees. We bear the token risk.
There is a major disconnect in that people think token usage is exclusively tied to human typing rates...it isn't true. When software developers evolve to using self-managing CLI tools (like Claude Code - the source article mentions this), they are not merely chatting; they are unleashing loops of agency.
When you enter one single inquiry of "find and fix the memory leak in the billing service" you are not submitting just one single inquiry. The tool is searching through an entire code repository for relevant code, pulling 15 related files into context (easily 200k+ tokens) proposing a fix, running the test suite and failing, taking an entire stack trace of errors into context and looping to keep iterating towards the solution.. In that process you can loop multiple times (10+) in a very short period of times (within 5 minutes). While you grab a cup of coffee you will have consumed $20 in token usage. At the enterprise level (like with Uber) when you multiply that out by thousands of software developers using it as a personal shell tool your budget disappears very very quickly.
And on your point about the junior developer: Comparing $100,000/year in tokens to hiring a junior developer is such a ridiculous false equivalency that even makes you question whether they even understand how to make such a comparison.
The cost to a business of one junior engineer with a $100,000 salary is not just the $100,000 in salary but also an additional $40,000+ in benefits and taxes, as well as in hardware.
Also, you are disregarding another cost of hiring junior engineers that is their mentorship cost. Each week, your senior and staff engineers spend hours mentoring junior engineers by reviewing their code, pairing with them, and unblocking their progress. Mentoring requires a substantial amount of time and will be expensive to your business.
The return on investment (ROI) for the $10,000 monthly expenditure on tokens is not so much about replacing the junior engineer with the AI. Instead, the ROI is that your senior engineers can use the huge amount of compute power to create boilerplate and tests, and refactor their code 3x quicker than if they had to mentor junior engineers. In addition, LLMs do not sleep, require one-on-ones, or leave for another company for 20% more pay in 18 months, when the value to the code base made them an asset to your business.
Lastly, the main reason that Uber has problems with their AI business is that due to the UX of these agentic tools, developers think of the API calls made to the AI as free and as a result, treat them like a basic grep command.
It's funny how Paul is recommending people use PR firms, while in more recent videos michael seibel and others have strongly recommended against using them. It's interesting how things shift in ~20 years
Oh it does... but what happens after 6 months is an entirely different story.
A codebase that has exploded in size 2-3 times in just a few months,... internal architecture that is not layers of simple parts anymore, but, layers of complex architectures corresponding to individual agentic runs,... a codebase that now has 10 times more if-else and individual codepaths because you were not clear enough in your requirements, and used the phrase "handle all cases",... a codebase that neither you, nor anyone else now understands properly, thus, can't comment on what's possible anymore, and and at what costs when your manager or PM asks,...and finally, due to combined effect of these, a need for an ever increasing token budget, and constantly increasing fragilty of new AI-generated code due to repeated context compactions.
And we haven't even touched on the security and performance elements yet.
The right way to use these tools is to use them as, what I like to call, "code-monkeys". You tell them exactly what you want, where you want, how to do it, and how to architecture it, and more.. and then make them code.
> When developer productivity tools become so valuable that engineers blow the entire budget in four months, the issue isn't the tool but that the budget was invented too early to forecast this adoption curve.
Uber must be the biggest tech company that got lucky with timing. They are so incredible stupid and incompetent. How on earth do you end up with that cost for AI per user.
I don’t understand. On the ChatGPT pro plan for $200/month, I am essentially running it 24/7 including nights and I can barely get it under the 40% usage mark. Why are companies not using this?
My company has an all you can eat policy, but I think we'd be well served by being thoughtful in optimizing usage so that we still have the overall capabilities but don't burn extra tokens by sloppy use.
I take a peak every month or so at spend for my company and notice more and more are consumed $1k in tokens a month and it is bewildering to me how. I use llms daily, and see anywhere from $200-$400 tops. This is using the most expensive models, in deep thinking mode. So I'm not a Luddite against the usage of them. I just can't figure how _how_ to burn that much money a month responsibly.
I genuinely challenge someone spending $5-$10k a month to demonstrate how that turns into $50-$100k in value. At a corporate level, I'd much rather hire a junior engineer who spends $100-$200/month and becomes productive then try and rationalize $100k/year in token spend.
> I just can't figure how _how_ to burn that much money a month responsibly.
From my experience, this happens essentially by three means:
- Level 0 (beginner users) long lived conversations: If you dont get in the habit of compressing, or otherwise manually forcing the model to summarize/checkpoint its work, you will often find people perpetually reusing the same conversation. This is especially true for _beginners_, which did not spend time curating their _base_ agent knowledge. They end up with a single meta conversation with huge context where they feel the agent is "educated", and feel like any new conversation with the agent is a loss of time because they have to re-educate it.
- Level 1 (intermediate users) heavy explicit use of subagents: Once you discover the prompt pattern of "spawn 5 subagents to analyze your solution, each analyzing a different angle, summarize their findings", it can become addictive. It's not a bad habit per se, but if you're not careful it can drastically overspend your credits.
Level 3 (expert users) extreme multitasking. Just genuinely having 10 worktrees perpetually in parallel and cycling between them in between agent responses. Again, not necessarily bad in itself, but can exponentially conse credits.
> Just genuinely having 10 worktrees perpetually in parallel and cycling between them in between agent responses. Again, not necessarily bad in itself, but can exponentially conse credits.
I'm pretty sure that growth is linear.
8 replies →
I’ve seen another pattern, I call it “The Document Mongerer”:
I regularly work in a largish monolith. We have micro services too, but most things are in the monolith. Over the years there have been multiple pushes to split it up into micro services. These efforts invariably fail because the _goal_ is the micro service architecture itself instead of something useful to the company, like the ability to do fast releases or better organized code.
Anyways, in the past few months I’ve seen multiple people individually ‘attack’ this insane goal with AI. The first step is always to generate massive amounts of documentation describing the current state of code and proposing areas to split up. Then, after the engineer generates this huge store of documents, they say ‘looked what I created’ and then drop it and move on to some other shiny toy. No one will ever read these documents. They are out of date before they ever get ‘completed’, their sole usage is to waste credits.
Missing here: some organizations were rewarding high token usage as productivity without critical evaluation. People were afraid to be in the bottom because outcomes weren't being measured.
It is a giant Goodhart's law lesson
10 replies →
Totally agree!
Bonus level "I have a hammer, all I see is nails": using Claude Code for random non-coding work, like dataset cleaning. It's really convenient to have a script spawning Haikus via `claude` CLI and feeding them prompts and JSON files. Money burn potential: practically unbounded, but also it's real work that the product people wanted done, so of course it has a cost associated with it. I'd be bewildered if anyone complained.
Where is level 2?
6 replies →
level 99 - They're using Gas Town
I’m basically doing lvl 3. There’s not a single port in my local worktree’s .env that’s not guaranteed to be unique across all worktrees. Skills for agent to start their own managed dev server, launch their own isolated instance of chrome etc. literally end-to-end code and debug the entire app. I do have to say though you have to know the app inside out and have to have a pretty well groomed backlog in order to run them all in parallel and actually benefit from it.
1 reply →
as a new user of agents, i am realizing i'm using a strategy basically identical to level 0. is the typical approach to just make a CLAUDE.md/AGENTS.md and start a new thread for each task or is it more complicated than that?
1 reply →
I spend about $3k/month (subsidized by the Claude Max plan).
I guess I fall under level 3 (2?): I typically have 3-6 agents working simultaneously on the same feature, they each make worktrees, code, run tests and put up PR’s. I also have Github actions which scan for regressions and security issues on each PR.
It makes my development cycle extremely fast: I request a feature and just look at Github and look for changes to my human readable outputs, settle on a PR, merge, repeat.
The issue is that I am now the bottleneck in my system. I find myself working basically non-stop, because there is always more to do. (Yes I know I can automate the acceptance criteria but that turns to slop real fast)
3 replies →
How do you compress or otherwise force a model to checkpoint?
What about Level 2?
>> Again, not necessarily bad in itself,
yeah, it is bad. Human brain is not able to properly assess this amount of changes. To understand even a small change you need a lot of capacity. To understand thousands of lines - impossible.
This is pure slop pouring into prod and we can see more and more consequences of this in all big corps's products - things start to break more and more exponentially faster.
4 replies →
First: There's the obvious "If the company is letting me do it, I'll be wasteful." This includes not clearing/compacting the context often. Opus now has a 1M context window, and quality is good to at least 200K. So each query is burning a lot of tokens until you clear/compact.
People have already mentioned the size/complexity of the codebase. I'm new to my team and the codebase isn't huge, but it's large enough that there are plenty of parts I have little understanding about. When I'm given a task, then yes, I definitely go to Claude and ask it to find the relevant parts of code so I can understand the existing workflow before even attempting to change it.
The downside is that I don't build expertise. But the reality is that with Claude, I can get the work done in 1 day that would take me 5 days of struggling, and if everyone is doing it, I can't be left behind. So I take the middle route - I get it done in 2-3 days instead of 1 so I can at least spend some time with the code.
Especially with AI, the rate at which code changes in our codebase is insane. So I built a tool that takes a pull request, and tells the LLM to go deep and explain to me what that pull request does. (Note: I'm not the reviewer, I just want to keep tabs on the work that is going on in the team).
And this is just the beginning. I haven't actually spent time to come up with more ways to use the LLM to help me.
My usage is similar to yours, but if I were fairly experienced with the code base, I'd do a lot more. I haven't asked, but I suspect there are people in my team who go over $1K/month.
As always, the bottleneck is proper testing and reviews.
Edit: I'll also add that for not-so-important code used within the company, I suspect most people are going full-AI with it. For my personal (non-work) code, I just let the AI code it all - the risk is usually very low (and problems are caught quickly). If someone is using the "superpowers" skill, then even for basic features you can burn lots of tokens. I usually start with 20-40K tokens and end up with 80-90K tokens when it's finished. Which means that many of the requests prior to completion were sending in close to 80K tokens. Multiply that with the number of queries, etc.
Wasteful, but if someone else is paying ...
> This includes not clearing/compacting the context often. Opus now has a 1M context window, and quality is good to at least 200K. So each query is burning a lot of tokens until you clear/compact.
I see this repeated by others, including coworkers. It completely ignores caching. Caching itself is complicated, but the "longer context window = more expensive" is not 100% true and you are hampering yourself if you're not taking full advantage of large context windows.
7 replies →
> But the reality is that with Claude, I can get the work done in 1 day that would take me 5 days of struggling,
Is it really a 5x ROI? Where are all the apps, games, platforms, SAAS's, feature s that have been backlogged for 5 years that are all of a sudden getting done? Because I see a modest ROI, and an _awful lot_ of shovelware.
1 reply →
> First: There's the obvious "If the company is letting me do it, I'll be wasteful." This includes not clearing/compacting the context often. Opus now has a 1M context window, and quality is good to at least 200K. So each query is burning a lot of tokens until you clear/compact.
What is wasteful? If you are costing the organization $x/hr, and spend an hour saving the company $(x*0.5), you didn't save money, you wasted it.
To the company, are you spending more time being token efficient to save less money than they're paying you for the time? That's not even getting into opportunity costs.
There is some extreme wasteful spending of AI tokens out there. But trying to get below $3k/month in token costs is often of questionable value.
I have ancedotal examples of claude code choosing a solution to a problem that is ridiculously token inefficient.
One example - was giving several agents different sub problems to solve in a complex ML / forecasting problem. Each agent would write + run + read a jupyter notebook. This worked ok, the notebooks would be verbose but it was fine... until one of them wrote out hundreds of thousands of rows to a cell output, creating a 500MB ipynb file. Claude tried several times to read it and it used my entire context limit.
The solution was to prescribe a better structure of doing the world (via CLI analysis scripts + folders to save research results to). But this required some planning, thought, and design work by me the operator.
When I see people spending $10k a month in tokens, I can only assume they are taking lazy hands off approaches to solving problems with the expensive hammer that is claude code. EX: have claude read all your emails every day... the lazy solution is to simply do that, but a smarter solution is to first filter the email body HTML to remove the noise.
> have claude read all your emails every day...
To be fair, I do that. 2-3 times a day, in fact. Not all of my emails (the archive has ballooned to several hundred thousand messages total), but the most recent ones certainly.
My standard prompt is along the lines of "go through the last N days of my emails, identify all threads that I need to know about, action on or follow up with". N is usually a number between 2 and 5. I've specified a standing of set of rules to easily know what is likely a source of noise to aid in skipping the bot spam.
The company is charged API pricing through an enterprise contract, and I remain persistently curious how much I burn. My daily admin-related token expenses appear to fluctuate between $1 and $5. For something that saves me up to 2h of time a day I consider that a rather tolerable deal. (When I dive in to code to do refactors or deep investigations, I can spend as much as $25 a day.)
1 reply →
> have claude read all your emails every day
But that is exactly what it is sold to people to do as a panacea: consume all the data, produce insights.
Nobody is being instructed to be judicious. Everyone is being instructed to use it as much as possible for all problem areas.
7 replies →
Really depends on the repo you’re working in.
If it’s very large, especially if the tool needs to refer to documentation for a lot of custom frameworks and APIs, you often end up needing very large context windows that burn through tokens faster.
If it’s smaller or sticks with common frameworks that the model was trained on, it’s able to do a lot more with smaller context windows and token usage is way lower.
The codebase and the topic you're working on are huge variables.
I don't use LLMs to write code (other than simple refactors and throwaway stuff) but I do use them heavily to crawl through big codebases and identify which files and functions I need to understand.
Some of the codebases I explore will burn through tokens at a rapid rate because there is so much complex code to get through. If I use the $20 Claude plan and Opus I can go through my entire 5-hour allocation in a single prompt exploring the codebase some times, and it's justified.
Other times I'm working on simple topics, even in a large codebase, and it will sip tokens because it only needs to walk a couple files to get to what it needs to answer my questions.
I'm currently in repos where the context window required is so large that the output is almost always "wrong" for the problem at hand. Quite a few people at my company burn through tokens this way, and it certainly isn't providing value to the company.
5 replies →
Begs the question if we should move on to minimal microservices so that whole project lives in context of llm. I hardly have to do anything when I'm working with small project with llm.
16 replies →
Yes, in a reasonable microservice land where the places you need to connect to are all documented in very concise places, you have have extremely productive $10 days. In the giant monorepo with everything custom, you can't just rely on built in knowledge of 80% of you libraries, so it's a very different world.
A place like Google has to be so much better off just training library concepts in, given how much of the things the LLM will "instinctively" reach for are unlikely to be available. Not unlike the acclimation period what happens when someone comes in or out of a company like that, and suddenly every library and infra tool you were used to are just not available. We need a lot more searching when that happens to us, and the LLM suffers from the same context issue. The human just has all of that trained in after a 6 months, but the LLM doesn't.
1 reply →
On larger repos it spends a lot of time just finding the one line of code that needs to change. (I have the same problem, as a human!)
So if the AI could do the same work on huge codebases with far fewer tokens, would it be good or bad for the AI companies do you think?
4 replies →
Will this result in people moving away from large monorepos to per-unit, quasi-micro repositories to save in token use?
> I just can't figure how _how_ to burn that much money a month responsibly.
Same but in regards to quotas. I'm on the 200 EUR ChatGPT plan, so presumable have the highest quota, using the "most expensive" models, on highest reasoning, in fast-mode (1.5x quota usage) and after a full day of almost exclusively doing programming with agents, I still get nowhere close to hitting my quota.
In fact, since I started using agents for coding, the only time I even got close, was when I was doing cross-platform development with the same as above, but on three computers at the same time, then I almost hit my weekly quota. But normally, I get down to ~20% of the quota but almost never below that. I don't see how I could either, I'm already doing lots of prompts and queries "for fun" basically.
Codex quota is suspiciously high right now. Either way, the subscription plans are not sustainable, and perhaps less relevant to any discussion about corporate API use. The prosumer developer plans are an insane deal. It is a golden age right now and it will end. If you tried to use the APIs to achieve the same thing, you would be spending thousands upon thousands of dollars a month. My completely unfounded conjecture is that OpenAI is trying to grab developers back from Claude by burning $$$$.
7 replies →
I have to churn to get to my ChatGPT Plus $20 plan limits with gpt-5.5 xhigh. Starts to feel like I'm doing something wrong.
I am running a bunch of autoresearch loops that optimize various compilers and its pretty easy to burn through as much money as you want if you have a measurable goal and good tests.
2 replies →
There are tools that let you extract out what the API price would be for a subscription plan use. I typically have monthly runs that are on the order of $2k - $4k at API prices, despite paying a mere $200/mo to Anthropic.
Edit: Just checked with ccusage and I've been doing about $450/day for the last week. A bit more than usual, but I still haven't come close to weekly limits and never hit the 5hr rate limit.
> Same but in regards to quotas. I'm on the 200 EUR ChatGPT plan,
The API rates and monthly plan rates are not the same.
If you're using enough to justify the 200EUR plan (instead of the 100EUR plan), your use might actually be as high as some of the API bills discussed above.
[dead]
I'm on the same page. Do people not analyze the problems themselves? Are they just copy/pasting their entire ticket description into Claude Code and having it iterate until they land on something that works?
I don't get it.
> Are they just copy/pasting their entire ticket description into Claude Code and having it iterate until they land on something that works?
That is exactly what they are doing, yes
43 replies →
That'd be crazy. The agent has a skill configured to fetch ticket descriptions from Jira by itself. Copy-pasting feels like manual labor.
Not what I do. I'll reformulate the ticket description so that the purpose and as many details as possible about the solution are made clear from the start. Then I tell Opus to go and research the relevant parts of the codebase and what needs to be done, and write its findings to a research.md file. Then I'll review that file, bring answers to any open questions and hash out more details if any parts seem fuzzy. When the research is sound I'll ask Opus to produce a plan.md document that lists all the changes that need to be made as actionable steps (possibly broken into phases). Then I'll let Sonnet execute the steps one by one and quickly review the changes as we go along.
2 replies →
> Are they just copy/pasting their entire ticket description into Claude Code and having it iterate until they land on something that works?
"Their ticket" = that was AI generated. After which they will wait their AI generated PR be checked by an automated AI QA that will validate against the AI generated spec.
It feels like important metric of "corporate AI adoption" should be how effective the human in steering the AI.
IF THE HUMAN ISN'T EFFECTIVE, THE HUMAN NEEDS TO GO.
You should.
If it manages to solve the working solutions - then it's great! why would you waste your time on it?
It it fails - then it's great! you find your value by solving the ticket, which can be a great example where human can still prevail to the AI (joke: AI companies might be interested to buy such examples)
(All assuming that your time cost is pricier than token spending. Totally different story if your wage is less than token cost)
Actually no. We ask business analysts to supply documentation for whole products. We use AI to analyze that documentation and after that we use AI to create tasks in Jira. Business analysts will review them.
After that we use AI to translate the tasks to a more technical view.
After that we use AI to implement the tasks.
After that we use AI to review the tasks.
After that a human QA tests the tasks.
If all is good, the code is merged and lands in production.
And yes, we burn a lot of tokens but the process is very fast. It takes months instead of years.
> Are they just copy/pasting their entire ticket description into Claude Code and having it iterate until they land on something that works?
There's also the pattern of creating an army of agents to solve problems. Human write a plan. One agent elaborates on it. Another reviews it and makes changes. Another splits it up into tasks and delegates out to multiple agents who make changes. Yet another agent reviews the changes, and on and on. All working around the clock.
If Uber is like most other companies, there's a leaderboard for AI tokens consumed. If maximizing your token usage is going to get you to the top of the leaderboard, and therefore promoted for "productivity", people are going to find creative ways to be "productive".
The tokenmaxxing leaderboard where I work has a lot of new hires on it
2 replies →
One thing that stands out it is it sounds like you're using LLMs for only one part of your process. You're having LLMs help you write code, but the code you're writing doesn't itself make use of LLMs.
My current job basically involves trying to improve processes that themselves make heavy use of LLMs. Once you have multiple agents in parallel running multiple experiments on improving the performance of primarily LLM driven tools it's not that hard to get your token usage pretty high.
Claude is a mediocre programmer that can do great things with great supervision, but it can't make mediocre human programmers into good ones, because they can't provide great supervision.
It will try and try and try, though.
id bet its the LLM doom loop: vaguely ask it to do something, tab to news.ycombinator.com for 30 minutes, tab back, noticed it misunderstood the prompt. Restart with new improved prompt, tab back to HN.
So yeah, probably the same thing people do anyway, just not compile time its now generating time.
2 replies →
Several options on how to burn that amount of money without being specifically looking to tokenmaxx
- Agents that spawn other agents
- Telling agents to go look at the entire codebase or at a lot of documents constantly
- MCP/API use with a lot of noise
- Loops where the agent is running unattended.
I do think it's not really responsible use and a loop where the agent is trying to fix CI for one hour for something that would take you five minutes (for example) is absurd. But people do that.
One of the new dynamics is a loop between a "code review" LLM and a "fix LLM". It's super annoying because the code review LLM often finds more bugs on a follow-up review that were there from the beginning, but at least I can loop both until check go green.
I spend 400-500 dollars per day during active development at this point. However with more aggressive task breakdowns I can spend ~5k per day.
These spend rates are in part due to operating on a larger code base. Operating on a larger code base means more time searching and understanding the code, tests, test output. They are also due to going all-in on agentic coding.
It can feel painfully slow to go back to coding by hand when for a dollar you can build the same functionality in a minute. Now do this with multiple sessions and you can see where the cost goes.
Your reply answers how you are able to spend money, not if it is returning sufficient dollar value per spend..
> I genuinely challenge someone spending $5-$10k a month to demonstrate how that turns into $50-$100k in value.
5 replies →
I've been working on a project to build a new Postgres based database in Rust[0]. I'm four weeks in and have 93% of the Postgres test suite passing. I've found agents to have worked really well for this as I have an existing codebase that has good architecture that I can point my agents at. It's also easy to debug as I can diff what my agents are doing and what Postgres is doing.
I've had to get multiple codex accounts, but there was a brief period of time where I tried API usage to see how expensive it would be. In about an hour I spent $650 of credits. I had codex estimate how much I would be spending if I was doing pure API usage and it estimated around $10k/week.
For context Postgres is 1M lines of C code. It's looking like pgrust will come out as less lines of code than Postgres and at peak I was adding over 100k lines of code in a day. I would estimate it would take a team of 5 software engineers at least 3 years to get to where I got in a month with a couple Codex subscriptions.
[0] https://github.com/malisper/pgrust
> responsibly
There’s your problem. You’re trying to be responsible instead of trying to burn tokens so you can have your name on top of some leaderboard for most wasteful AI users.
The perverse incentives created by these AI leaderboards are crazy.
3 replies →
I dont use automated agent workflows or anything, I just use clause as a pair programmer of sorts. A month or so ago I used claude Opus 4.6 for 2-4 hours on API pricing and racked up $20 in spend, which surprised me since that was much higher than my usual.
I dont know about $10,000, but i can see hitting $1,000 pretty easily if you aren't looking at the costs.
It turns out writing good prompts helps to keep token usage down as the model wastes fewer tokens discovering context it needs that wasn't hinted at in the prompt.
Whereas a good prompt will give solid leads to all the specifics needed to complete the task.
>I genuinely challenge someone spending $5-$10k a month to demonstrate how that turns into $50-$100k in value
At a lot of businesses $5-10k/mo of AI spend doesnt even translate into $5-10k/mo value. Churning out code was rarely the business value bottleneck. It was convenient for everybody else to blame developers not writing code fast enough for their failures. Now they have no excuse but I doubt will own up.
Multimedia feedback can burn much more than that. If I'm sending frames of 3D engine's output. I mean I would like to send it a video if I could but that is too expensive but I'm sure there's orgs out there that really do want every frame in a prompt doing something. This can be exponential depending on the application. I recently wrote a Milkdrop visualization analyzer. I could have sent thousands of frames for each one. I didn't but well I wish I could haha.
Yeah, I use Claude Code to do security reviews. For every CVE that Wiz flags, I have Claude Code check for reachability analysis.
I typically consume about $200/month doing this. Most of our engineers are in the $200-400 range, with a few people around $1,000.
But then there's one guy who's not only hitting $8,000, but supposedly has nearly 300,000 lines of code accepted (Note: This means he's accepted the lines of code from Claude, not that he's committed it). I can't figure out how.
Do lots of deep research and code reviews on large legacy codebases. I've created lots of documentation to reduce token consumption but it's still a lot of token consumption.
The answer may be agentic loops that keeps cycling through the same problem again and again until they land on a non-erroneous outcome. Some people boast having multiple such agents working in parallel on different problems, tending to one while another is processing, perhaps not unlike the movie mad scientist who runs around the lab throwing switches while laughing maniacally at the prospect of his impending success.
There was a tool posted called codeburn that showed a breakdown of what activity your usage was spent on. Mine was almost all coding but other people in the thread said >50% of their usage was conversation. I’m inclined to agree with you that someone who is reasonable with their compute usage is likely to be thinking things through rather than just burning tokens to get an LLM to solve the problem
In addition to what folks are saying here about larger code bases and multiple features at once, there’s also the time requirement to be efficient. It takes time to be more efficient with token usage and it may not be worth it for some of these companies so… burn away until we start to get more data and then we’ll check in.
> I just can't figure how _how_ to burn that much money a month responsibly.
I always have a few agents (2-5) doing research and working on plans in parallel. A plan is a thorough and unambiguous document describing the process to implement some feature. It contains goals, non-goals, data models, access patterns, explicit semantics, migrations, phasing, requirements, acceptance criteria, phased and final. Plans often require speculative work to formulate. Plans take hours to days to a couple of weeks to write. Humans may review the plans or derived RFCs. Chiefly AI reviews the code (multiple agents with differing prompts until a fixed point is reached between them). Tests and formal methods are meant to do heavy lifting.
In my highest volume weeks, I ship low hundreds of thousands of lines of software not counting changes to deps.
> At a corporate level, I'd much rather hire a junior engineer
Any formulation of problem sufficient for a truly junior engineer to execute is better given to an agent. The solution is cheaper, faster, and likely better. If the later doesn't hold, 10 independent solutions are still cheaper and faster than a junior engineer.
There is no longer any likely path to teaching a junior engineer the trade.
Just out of curiosity, what type of systems are you working on? What type of features did you implement on your 100k LOC week?
3 replies →
I am sorry, I am probably just very dumb, but this sounds extremely wasteful. If this is a reflection of how software was made before AI I wonder how anything was ever made.
> In my highest volume weeks, I ship low hundreds of thousands of lines of software not counting changes to deps.
I suspicious you actually get claude to output that much usable code in a week, but maybe you do.
But I’m 100% positive that you’re not shipping even a small fraction of the amount of value that someone reading this 2 years ago would have expected from hundreds of thousands of lines of code.
You will burn yourself out in months at that level of daily context switching.
It isn't worth it.
> In my highest volume weeks, I ship low hundreds of thousands of lines of software not counting changes to deps
But what do they actually do?
I keep seeing people wax poetic about the mountains and mountains of code that LLMs are dumping out but I'm yet to anywhere near a proportionate amount of actually useful new apps or features. And if anything the useful ones I do find are just more shovels for more AI. When do we get to the part where we start seeing the 10x gains from the billions of lines of code that have probably been generated at this point?
I dunno I've seen agents make boneheaded mistakes even a junior engineer wouldn't make. Treating them as strictly better than junior engineers is a problem, not just for that reason but because you're effectively killing the pipline for senior engineers. Then what?
1 reply →
On the OpenAI side, GPT-5.5 generates spend at a prolific rate that's even faster if you use it through an ACP connection in a tool like Zed. I used to never think about Codex rate limits and now I'm hitting mine every 5 hour block and spending ~$100/day on top of that in adhoc credit purchases.
I think companies are charged API prices vs individual prices. That alone is 10x for Anthropic. Not sure though.
Don't underestimate corporate waste. If it's not someone's job to care for something, they really won't.
Even before this AI wave, it was common for me to see spinning dev environments for like $3k/month that hadn't been used in months on AWS.
I use it as an ide. I am a security engineer but there a bunch of predictable things I need to write code for. Onboarding logs, writing detection rules, SOAR type stuff. It makes a diff and locally tests all the permutations I describe than I review the code.
I don't think it's about value. Tokenmaxxing is a thing now since that one CEO said he wants his $250k/yr devs to use $400-$500k/yr in tokens, so now it's all about how many agents can you have running concurrent tasks all day long.
In our org it's people that have too much stuff in their context, every mcp in the world installed, GTD, PAI, OpenClaw. I'm equally baffled how one can spend that much money during their day to day.
I also don't think a lot of people know some of the more advanced context management tricks like /rewind /fork /tree to take advantage of prefix caching
Your estimates do not account for speed of delivery. If an AI can deliver 10x faster, the target is less than 10x a dev salary.
But 10x faster also gets you to market sooner. Which has value.
I could argue in all the ways my personal experience disagree, but lets just Occam's razor:
Most people agree big orgs regularly have dysfunctional incentives. We've seen it happen a thousand times.
Your suggestion requires we also assume a 10x faster delivery time by people spending 200$ vs 1000$ - something I've yet to witness or hear a credible account of.
So while that might be true in a small number of cases, in general its foolish to go with the "10x delivery speed" hypothesis.
1. Worktrees
2. Multiple simultaneous projects
3. Orchestration that includes handling of CI workflow
4. Active work to further improve or refine tooling
5. Experimentation producing muscle memory as experience versus code output
It turns into 50k to 100k or more of value for the employee the moment upper management made AI spend a personal performance target across most corporations.
They keep forgetting to put "make no mistakes", "think deeply" and "get it right the first time" in their prompts.
When people have no ability to understand what they are doing, they will just rerun it endlessly hoping they get something passable. When that doesn't happen they burn money.
I doubt most of this is from rerunning the same prompts over and over. This token burn is more likely from people using swarms of agents and orchestrators for “efficiency”.
“I’ve got 2 dozen agents churning through the backlog to build this feature that would take one agent an hour to implement.”
1 reply →
Keep word doing A LOT of lifting “responsibly”
At least your workplace doesn't frame raw usage as a leaderboard, with awards given out for topping it
You're probably generating new code rather than analyzing old code for "improvement".
> in deep thinking mode
You mean deep brute-force mode of search results parsing themselves…
$400 * 23 business days would be $9k. Sounds ballpark to me
Many companies actively hide the cost from their employees.
Do you run 20 claud code agent on max for 8 hours a day? :)
a good way to prevent companies from adopting AI (and keeping your job) is to waste tokens making AI cost prohibitive
That would be true in a sane world with investors who value profitability. But everything is now focused on DAU and the network effect. Overusing their services might actually make them look better to investors who shovel more money to them to light on fire.
Idk about Uber in particular, but aside from legit programmers using AI to help them do legit work faster, there are people spamming it for metrics. And the hiring pipeline has gotten screwed up somehow, like half the people who reached the onsite interview for a technical role lied about all their technical skills, or they didn't lie and manage to pass hiring but then only take the tasks that AI can solo. And if it can't, waste tokens until giving up.
Advanced agentic prompting.
Try the Jira MCP server.
It really depends on the way you use AI. If you just prompt it for a task and either accept or reject the output, you won't spend much.
But if you are like me, you aggressively document and brainstorm before planning, you review that documentation with subagents, make modifications, you aggressively plan, you verify that plan with subagents,make modifications, have a large number of phases, planning again for each phase, writing tests to cover 100%, implement each phase, do intermediate and final code reviews with subagents, apply fixes, write final documentation and do all these in parallel, if you have multiple tabs in your terminal each running Claude Code for 10-12 hours a day, then $5000 per day is not much.
If you use Anthropic or Open AI subscription and you spend $1000 per month, you are not using AI much.
Are you bringing in at least $1.25M in additional yearly revenue to your company?
1 reply →
And also some of use run tens of rounds of gradually improving the projects. And that burns tokens like crazy.
I spent $24,096.47 in "API" costs with my $200 Claude Code Max subscription in April.
I'm building my own saas. I spent 6 months writing the code by hand before using Claude, and that was fine, but its much faster to give the exact specs to Claude and have 3-4 sessions working in parallel with me. When you validate changes with exact test specs there's much less correction you need to do. I always hit my weekly limit and it's far cheaper for me to use this than to hire someone and spend time onboarding them.
> I use llms daily
this is your “problem” - you are missing the “nightly” part. on my box LLMs run 24/7 :)
Slop architecture leads to compounding problems that people try to solve with more slop. If one wants to control the quality of the code then the throughput and multithreading is bottlenecked by how much code one can comprehend in a given period of time.
> notice more and more are consumed $1k in tokens a month
I've said it before: if you allow people to see how much others spent, they will try to climb up the "leaderboard".
It takes just ONE little praise for using tokens or one perk gained, and the GAME IS ON among the developers!
I used CC frequently for development, Opus 4.7 with high thinking, with a $100 Max subscription, and haven't been rate limited yet. IMO a subscription is the way to go as it puts a ceiling on spending.
> I just can't figure how _how_ to burn that much money a month responsibly.
Well, if your bonus depends on spending it, you'll find a way.
My observation is - pasting long documents is a great way to burn tokens. Turn based conversation, even a very deep and technical one, consumes less tokens than "read these logs and tell me where the problem is". Ironically, the log reading example is a perfect use for a local LLM.
You are probably guiding them step by step and reading the results. Maybe you also sit and wait for the results.
Agents can iterate on a problem for hours if they can see their results and be given a higher level goal to evaluate their progress toward.
When you have an agent working for minutes or hours, never wait on it. Use that time to spin up another agent.
You can also spin up several agents in parallel to attempt the same item of work and compare their results to choose which to work off for next steps, instead of rolling the dice on a single option at a time and gambling that it's better to refine that first attempt instead of retrying from the start several more times.
And if you are doing manual QA manually, you're missing out on having e.g. Codex's "Computer Use" or "Browser Use" automate your manual verification steps and collecting a report for you to review more quickly. Codex can control multiple virtual cursors simultaneously in the background without stealing focus, to parallelize this.
If you want to use up more tokens to get more done (though more outside of your control and ability to review of course), that's how.
It's easily explained. People are losing their skill in real time and literally cannot develop anymore without AI. That's it.
I'm working on some serious data analysis + realtime async code, and I use 200-400 million tokens a day with Claude Code alone (via ccusage). The complexity of the code seems to have a big impact on the number of tokens used. On simpler projects I use many fewer tokens.
My programming endurance is much greater now (2-3x focused hours per day), my productivity per hour is multiples higher, and I code seven days a week now because it's really exciting.
All told, I would pay for these tools as much as I would pay for full-time human programmer(s).
> I'd much rather hire a junior engineer who spends $100-$200/month
I'd much rather hire a junior engineer at $1.20/hour too! Can you hook me up with your contract services provider?
Obviously I know you're talking about AI costs only. But the idea of doing that analysis without looking at the salary of the person running the tool seems to be completely missing the point.
Now, sure, there are legitimate arguments to be made about efficacy and efficiency and sustainability and best practices. But, no, $100k/year absolutely doesn't need to be "justified" if it works. That's cheaper than the alternative, and markedly so.
> But, no, $100k/year absolutely doesn't need to be "justified" if it works. That's cheaper than the alternative, and markedly so.
If you're trying to say that 100k is less than 200k, you're right.
I don't see how any of that won't need to be justified. You can spend a lot of money and not get enough of a return...
1 reply →
[dead]
[dead]
[dead]
[dead]
In your fictional world you hire a junior who will write code manually, right?
First , I interview people, Junior skills in manual coding dropped sharply this year. These are people who started they school manual and switched mid-course. In two years there will be no such people.
well, that will never happened anymore in this world unless we will go back to caves, especially for juniors. Junior that writes good code is already a dying unicorn.
The outcome will be ... you will hire a junior ... who will burn more tokens, and chances of mistakes with less expensive model, less tokens are even higher.
Puh not good signs at all.
I mean even the normal people we get in interviews have no clue, like 80% are just ignorant.
I stoped an interview after 5 minutes: when i asked what ls -ahl is doing, he started telling me how he vibe/ai codes stuff and thats his workflow. Okay if you don't know the basics, guess what? everyone can replace you or at least i'm not hiring you (i only told him thats not what we are looking for and thanked him)
we are doomed :D
> well, that will never happened anymore in this world unless we will go back to caves
The bubble is an echo chamber.
3 replies →
The fully loaded cost of a senior engineer is already well past 400k. +5k a month is not that much if it helps them be XX% more productive. Personally at a different big tech I'm in the mid 4 digits AI spend per month and it helps me a lot, basically all coding has been trivialized and I work on an extremely large codebase. I'm spending more time on things closer to direct value generation like data analysis and experiment tweaking rather than spending time moving a variable across 10 layers of abstraction and making sure code compiles.
I know I'm responding to AI right now, but
> which means figuring out if the company can afford this level of productivity at scale.
If it was actually productive, then the revenue would increase and affordability wouldn't be a question.
Yes, my thoughts exactly. Productivity by definition creates things, hopefully valuable things. Is all the extra burn on chatbots worth the cost? Has Uber somehow gotten dramatically more efficient and effective due to this massive budget overrun? Or have they just given people shiny and expensive ways to push the same work around?
> If it was actually productive, then the revenue would increase and affordability wouldn't be a question.
Revenue has increased. Have you seen Meta's latest earnings? +33% revenue - in this economy.
Affordability is not a question. There is a reason companies like Meta have no issue with their engineers spending $1k/day on tokens. It's just not that much compared to how much they make per employee.
How can that be attributed to any code an LLM wrote?
>$8 billion of net income was the result of a tax benefit the company realized in the first quarter of the year.
So exactly how much of their revenue is because of any code LLMs wrote vs. just structural tail winds?
8 replies →
After losing 20 million users? https://www.theverge.com/tech/921089/meta-earnings-q1-2026-u...
I really don't understand their economics.
This article is about Uber, not Meta
That means absolutely nothing in the context of this conversation. It says right in their release ad impressions are up almost 20% and cost per add is up 12%. Those two metrics alone account for most of the increase in their revenue. Absolutely no conclusion can be drawn regarding the impact of AI on those numbers one way or the other.
It's not like they used AI to crank out some new revenue generating piece of software, or massively reduce operating costs. In fact their operating costs rose by 35%.
4 replies →
Not every change a developer makes increases revenue, and the changes that do often have a lag time.
I'd argue it's often the contrary -- since it's easy to ship features and fixes, people often ship things without questioning if it makes business sense to support a use case, or if the design is solid. Now you have exactly the same revenge but more things to maintain
4 replies →
This is my thought too. The eggheads in accounting set budgets, and we produce products within that budget. I could be twice as productive with twice as many people, and maybe 50% more productive with good AI, but if it's not budgeted for it's an issue (especially short-term before the product is released).
Steelmanning the other side: a counter example would be if competitors use the same tools to achieve the same productivity gains.
> If it was actually productive
They are extremely productive if you use them right. To the point it worries me how clever these pseudo-AI models can get in the next year.
That is not true at all. No matter how "productive" a company is means nothing if people aren't buying your product. And using LLMs to be more productive will not convince anyone to buy your product. Human creativity and intuition to make a product that people want to use is what sells. Productivity for productivity's sake doesn't really move the needle at all, and can make things worse.
> 95% of Uber engineers now use AI tools monthly with 70% of committed code originating from AI.
Well, that’s to be expected when using AI tools becomes relevant in your performance evaluation.
It's actually incredible the extent to which non devs imposing KPIs on devs underestimate how badly this will get gamed, whether it's AIs, PR/line counting or whatever.
Gaming is one thing, fundamentally not understanding how engineering works will lead to shittier outcomes and cost the company in ways the management will never understand.
Management in the age of AI is falling for the doorman fallacy wrt engineering. If lines of code were the most valuable aspect of software engineering, my front end JavaScript intern would’ve been the most valuable person in the company. https://www.jaakkoj.com/concepts/doorman-fallacy
1 reply →
Exactly. At Cerebras I know of several people who burn tokens on completely USELESS tasks (randomly changing pixels in an image) just to keep them high up on the token leaderboard.
I suspect the other tokenboard leaders are doing the same. They made the metric "token usage" (which is just a proxy for LOC) so that's what they're gonna get.
Someone at my job uses AI tools to reformat his code...
5 replies →
I think PRs is pretty good, IF
1. you sample a few to see that they are actually meaningful,
2. they go to prod and are validated without having to roll back.
Still needs to be managed. But it should be much easier for a manager to catch an engineer gaming PRs than something like AI use or lines of code.
1 reply →
Easily fixable with another KPI to measure the gaming itself :P
When managers and VPs all say, you must use AI or else you will not work here, then yes, people will use it.
yeah and once the KPI is "how much AI did you use" instead of "what did you ship," the budget blowout writes itself. people will game the number.
I don't understand this critique. (1) Did you previously think you weren't getting paid for doing what a company wants you to do, aka what THEY thought was productive? (2) Do you think all this AI generated code is useless?
Edit: y'all are some whiney folk, ain't ya?
I think the point was that, when you make a metric goal of "you must use AI this much", then people will use AI even in ways that isn't adding to productivity.
To answer your second question: Yes, much of it is worse than useless. The tools need guidance to produce useful output. If you use it poorly, you will get garbage output that may do more harm than good.
And your response does not address the point being made in the comment you replied to: Many people are being evaluated by how many tokens they burn, which is about as good a metric as lines of code written.
1) I think if the company I work for spends too much effort on things that aren't going to make money, they won't be able to pay me anymore, no matter what they "think" is productive. That's not how executives at companies like this make decisions, though.
2) Mostly, yes.
I think parent is saying "% of code being generated by AI" is not a generally good, direct metric for business value. It's akin to the "we are pushing SO MUCH CODE" phase of early ai marketing.
If we're trying to measure the value of adopting tool, it's probably better to measure the ROI of that tool rather than the usage % of that tool, especially when usage is basically mandated.
To directly answer your questions:
1. You're being paid to create value for the business, which "doing what they think is productive" is a proxy for. You're not being paid to use a tool a high % of the time.
2. I doesn't seem like parent even commented on the quality of the code generated. I think anyone that uses it regularly can agree that: a) the code is not useless and b) all generated code is not immediately production ready c ) AI generation of code is an accelerant for software development
Goodhart's Law isn't a problem immediately. If you want more code to be written, and the only feasible way to write it to goals is to heavily use AI, then you might run into the problems of AI-generated code, and an infrastructure that's poorly architected and much less understood than it would've been ten years ago.
Not OP, but:
1. At my level, the company is not just paying me to do a task the way they want it done, they are paying for my experience to orchestrate the best way to do it. They want an outcome, and I'm responsible for figuring out how to get to that outcome with the right balance of cost, correctness, etc. But yes, the most dystopian reality is what you said.
2. It's not useless, but the AI generated code is absolutely lower quality than what I would have written myself, but there is no desire to clean it up. Companies have always had a disastrously bad understanding of technical debt and they finally have tool they can shove down developers throats that trades even more velocity for even less quality. They're going to take that trade every single time.
> (1) ...getting paid for doing what a company wants you to do...?
At my previous company, when the thing they thought they wanted me to do (which was not the thing they actually wanted... but whatever) diverged from my values I quit. You can just do things.
> (2) Do you think all this AI generated code is useless?
Almost universally, yes. Especially in organizations that historically haven't been particularly careful about hiring and have a huge number of young, inexperienced people. There are exceptions but they're rare enough that throwing that particular baby out with the bathwater isn't a big loss.
you're missing their point; LLM use is often a part of your evaluation at some of these larger companies and they expect you to use them heavily or you will get a lashing
GP just saying that any metric will be gamed and if you have some costs that is associated to that, it will grow. Let’s say you set some metric that says the most productive dev are the ones that has the most files changes, you can soon expect every function and structure to be its own file. Same if you say that sales commision are based on how much time you spend calling, expect the phone bills to grow a lot.
I love how these articles drop, and all of a sudden HN is filled with people who think engineering productivity is simple to measure.
Yes, productivity implies revenue (or cost reduction), and revenue is measurable.
However:
1. You spend money today to build features that drive revenue in the future, so when expenses go up rapidly today, you don’t yet have the revenue to measure.
2. It’s inherently a counterfactual consideration: you have these features completed today, using AI. You’re profitable/unprofitable. So AI is productive/unproductive, right? No. You have to estimate what you would’ve gotten done without AI, and how much revenue you would’ve had then.
3. Business is often a Red Queen’s race. If you don’t make improvements, it’s often the case that you’ll lose revenue, as competitors take advantage.
4. Most likely, AI use is a mixture of working on things that matter and people throwing shit against the wall “because it’s easy now.” Actually measuring the potential productivity improvements means figuring out how to keep the first category and avoid the second.
This isn’t me arguing for or against AI. It’s just me telling you not to be lazy and say “if it were productive you’d be able to measure it.”
> HN is filled with people who think engineering productivity is simple to measure.
I think the prevailing (correct) consensus is that developer productivity is actually very hard to measure, and every time it is attempted the measure is immediately made a target making the whole thing pointless even if it had been a solid measurement- which it wasn't.
IDK where you're getting the idea here that measuring productivity of anyone who isn't a factory worker is easy.
I do not think it is easy, like I said. I am saying other people are acting like it’s easy.
See the second comment on this article. https://news.ycombinator.com/item?id=47976781
See @emp17344 responding to me.
5 replies →
Is it easy to measure a factory worker's productivity? It would seem surprising and interesting if every job's productivity is hard to measure except for one particular kind.
1 reply →
> You spend money today to build features that drive revenue in the future
Totally but new features in their app or better software are not going to increase Uber's revenue/profit significantly.
This is the message that somehow the tech industry is constitutionally incapable of absorbing. The "innovation impulse" is cancer. I have no idea why tech managers keep harping on about "innovating", it's so bizarre.
I mean, the option is not zero productivity or some productivity: it could be negative.
We doubt the productivity because we have enough experience with Claude Code to know that flooding your organization with that many tokens isn't just unproductive, it's actively harmful.
Minor shifts in productivity are hard to measure. Major jumps in productivity would be obvious. I think it’s clear that, if AI is affecting productivity, it’s to a minor degree at best.
i think it will make things go backwards.
the big leaps in productivity come from really great ideas that are formalised into concepts that then take form.
this comes from being in a meditative state. not blasting output at a higher rate.
5 replies →
If it were 10x productive you'd be able to measure it indirectly, you'd be unable to avoid measuring it. So the initial claims were clearly lies. The research question is:
I agree that's very hard to measure. But given what this shit costs, it had better be answerable, and the multiple had better justify the cost.
> figuring out if the company can afford this level of productivity at scale
This is the thing that boggles my mind. They spent their budget. They have 4 months of data. What do they have to show for it?
I'm not a hater; I'm not a luddite. I have a $200 Max plan and I use it.
But are you saying that Uber made this tool available, urged everybody to use it, and is confused about what happens when it worked? It's one thing if they decide AI isn't productive enough to be worth the cost.
Are they out of ideas on what to build next, or something?
The personal max and teams plan actually are an amazing bargain compared to the API PAYG cost you get with Enterprise. I guess they really need their Enterprise features though, otherwise they could just tell users to expense a $200 max sub. Enterprises gonna Enterprise.
Entreprise gets you the written agreement that the data you send to Claude will never be used for model training
5 replies →
> What do they have to show for it?
My guess is nothing you can see right now, since it likely takes a lot longer for any substantial external-facing changes to roll out broadly. Internally I'm sure several features have moved faster. I've noticed this at Salesforce where it certainly seems like things that would have taken a few weeks take a few days now. This doesn't translate directly to more money, just more potential to make money.
What I don't understand is there are really good controls for spend, why on earth didn't they put caps on?
Or ask engineers to justify the spend?
Why should we spend that many tokens, what will that get us in return?
If this was AWS we'd all be pointing and going "Ahhhh you twats, didn't you look at your monthly spend?"
> Are they out of ideas on what to build next, or something?
Well, what is there for Uber to build next? They have their ride hailing platform. It works. They have adapted it for other kinds of delivery (food, groceries, "anything that fits in a car") What else is there in the "someone driving a car" space for them?
Car Fleets, professional freight logistics.
There is a lot of things to do in some driving a vehicle space . The other obvious business (that they exited) is self driving of course .
> I'm not a hater; I'm not a luddite. I have a $200 Max plan and I use it.
I'm glad to see we've reached the point of AI discourse at which anything that might be construed as criticism must be prefixed by "I'm also part of the cult, I'm not a non-believer, but" to avoid being dismissed as a heretic.
Since AI has become a partisan political football it makes sense
Speaking as someone who's bootstrapping here, I'm often envious of engineers at these larger companies, but I also worry that the incentives are screwed up.
If I were an engineer at Uber, why wouldn't I select gpt 5.5 pro @ very high thinking + fast mode for a prompt? There's no incentive not to use the most powerful (and thus most expensive) model for even the smallest of changes.
I tried one of these prompts for some tests I'm doing for image->html conversion, and a single prompt cost me $40. For someone that's paying that themselves, I'd pretty much never use this configuration. For someone at a large company where someone else is footing the bill, I'd spin these up regularly (the output was significantly better, fwiw). For engineers they're being rated on what they deliver, not the expenditure to get there.
There are ways to do this cheaply, but there are no incentives for engineers to do so.
SWE's are expensive; median salary is $133k (not counting health insurance, payroll taxes, etc). If you can shave off an hour of dev time with $40 in LLM credits, that's $26.50 cheaper than having them do it without.
I'm not entirely convinced it works out that way so far, but that's the theory.
Trying to bring down LLM costs is sort of a double-edged sword, because the dev needs to be cutting LLM costs by more than what you're paying them. If it takes them a day to bring costs down by $1 an invocation, then it takes almost 2 years to recoup the salary costs. It's worse because LLMs currently change so much I wouldn't be confident that their solution won't be broken before the 2 year period. Will we still be tool calling in 2 years, or will that be something new? Will thinking still be a thing, or will it be superceded by something else? I don't think anyone knows, even the frontier providers.
> If you can shave off an hour of dev time with $40 in LLM credits, that's $26.50 cheaper than having them do it without.
This assumes that that hour shaved was used elsewhere productively which is not the case.
1 reply →
We ended up using a service like yakpdf, for HTML to PDF generation.
It handled most of the rendering issues out of the box compared to headless browser setups.
Companies may first want to see how fast you can scale work and then trim it back down for efficiency.
How could they implement it? Try testing a bunch of models (closed and open sourced) and then seeing which one gives the best returns for it's cost? And then how do they check if it's being properly used, I have read of people just throwing their token budgets to the fire so that they show high usage for KPIs, while the most obvious cases of "X do this very wasteful thing" will be culled quickly (hopefully), I don't see how non-technical management can see through the thinnest layer of malicious compliance
image->html is a pretty involved task though. That’s basically a frontend dev’s job. $40 wouldn’t cover an hour of their time.
According to [1], there are about 5500 people in Engineering at Uber. Using $1250 as the mid-point of the $ spend range, that comes to about $6.8 Million in engineering AI spend, ballpark, with the range being $2.75 Million - $12 Million. The article lists $3.4 Billion as the R&D spend.
The AI spend does not appear to be a significant chunk of R&D spending (0.3% in 4 months or 1% annualized). If they didn't plan for it, sure, it's not peanuts in the budget, but in context not that much.
The real question is, what did they get for that amount? The article claims that 70% of the code commit is now AI-generated, so presumably the code passed review and tests. Did it accelerate the feature count? did it reduce quality problems? Did it lead to other benefits?
Sadly the article is silent on the outcomes, besides the higher spend.
Maybe 4 months is too soon to assess the benefits. On the other hand, in an agile world ...
[1] https://www.unifygtm.com/insights-headcount/uber
The actual source https://www.theinformation.com/newsletters/applied-ai/uber-c... says "about 11% of real, live updates to the code in its backend systems are being written by AI agents built primarily with Claude Code, up from just a fraction of a percent three months ago" and "He wouldn’t disclose exact figures of the company’s software budget or what it spends on AI coding tools."
Everything in this article is purely fake. The numbers don't add up, don't match any reported info, and are just fiction.
> that comes to about $6.8 Million in engineering AI spend
That would be per month. Per year it would be $81.6M.
A small fraction compared to the R&D budget but still a huge amount of cash to spend on something with (apparently) very little impact on the whole business.
I think as it becomes more common for executives to think we can replace software engineering with agents, I wonder if they might be basing their decisions off of unrealistic perceptions of the average software engineer. I guess I'm mulling two somewhat contradictory senses:
1. You get out of it what you put into it. A savvy CTO might be incredibly excited by everything they can do with agents, and improperly think that all the software engineers can do the same thing, when in reality your org's average software engineers might not have the creativity to even think of many cases where it could save them work. So by mandating agent usage, you might find that productivity hasn't improved while AI costs have increased.
2. When using AI, there are two gaps that become more obvious. First is the gap of: who tells the agent what to do? In many orgs, product isn't technically savvy enough to come up with a detailed spec/plan that LLM can use. And many cog-in-machine developers aren't positioned to come up with the spec, they just want to implement it. By expecting work to be implemented by agent-using developers, you might instead find a lot of idle workers waiting for work to show up. Second is the qa/review cycle. You've introduced a big change to the org but are you really saving cost or shifting it?
I'm all for introducing LLM as optional to help existing developers increase velocity and quality, but I think the "let's restructure the org" movement is really dicey, especially for mid-size or smaller employers.
> You get out of it what you put into it.
Beyond that, it's a force multiplier and it doesn't care if the force is positive or negative. Someone with poor software engineering principals can use AI to make an absolute mess quickly.
Related to 2, my company is strongly pushing for developer to have product mentality and be less of just a cog in a machine.
I am biased because I have more of a product mentality than other developers, but I think these are the people better positioned to be more productive with agents: know enough tech to be able to implement things with agents, and know enough product to know what should be implemented.
I expect other companies to follow.
You're basically arguing for massive headcount reductions.
How so?
1 reply →
What is Uber developing? They're an app and a car allocator back end. Both work OK. Why are they spending so much?
They gave up on self-driving, so that's not it.
This is a really underrated comment. It’s a great question and speaks volumes as to what the hell so many modern tech cos are actually doing with all their resource. Didn’t Elon strip most of the team at Twitter away, after some awful false starts, it pretty much ran fine on about 80% less human resource?
There’s a difference between:
1) the minimum number of employees it takes to maintain the core product
Vs 2) All the employees that it makes sense to hire for revenue and market expansion.
Internet comments usually assume that (1) is the goal. But think of say the sales department. If every salesperson you hire brings in new company revenue that’s greater than their salary + overhead, then why not hire 1000 of them?
> Both work OK
If only. The optimizations they do on their matching algorithm has made the UX so terrible, I regularly use Lyft instead now.
this is the most tired hn comment ever
"X is just Y - why is it so complicated?"
its lazy and boring to read these on every thread about a disliked big company
It's very easy to blow through hundreds of dollars a session using API tokens especially with the 1m context if you aren't careful about clearing old context.
At the same time the subscription will allow the same usage for hundreds of dollars a month.
Either Anthropic is absolutely hosing API users, massively subsidizing subscriptions, or a little bit of both.
https://www.forbes.com/sites/annatong/2026/03/05/cursor-goes...
"Cursor estimated last year that a $200-per-month Claude Code subscription could use up to $2,000 in compute, suggesting significant subsidization by Anthropic. Today, that subsidization appears to be even more aggressive, with that $200 plan able to consume about $5,000 in compute"
Really curious how many people actually get close to that level of usage? Their general business plan only offers the $100 version, with pay-as-you-go above that.
If 95% of people are using $100 of value a month, the whales may not be hurting them that badly.
1 reply →
That’s based on Anthropic’s retail price right? Not a fair comparison, like saying that Netflix must be losing money because every movie rental is $4 and a Netflix subscriber can watch 20 movies in a month.
Anthropic has a very "interesting" business model where you get subscription pricing as long as you are under 150 employees. When you hit 151, you have to start paying API prices overnight for everyone, and your total bill instantly multiplies.
They are getting you hooked on cheaper tokens, then raking you in when you get scale. I'm sure Uber gets a break on list price, but I doubt they are anywhere near <150 employee subscription pricing.
You have to remember that enterprise pricing is covered by NDAs
But things to note:
1) the per user license fee is almost certainly waived.
2) if you look in teams, when you buy extra credit, you get a 30% discount if you buy in bulk.
Unless you default into enterprise from teams, you're almost certiantly not going to pay the list price for per token price
Is that known to be true? Enterprise pricing is opaque. I am aware of at least one 151+ organization using a flat-cost $150-200/mo per seat Claude Premium contract. Reportedly most employees don't need to top up with additional API usage to cope with token limits.
Strange pricing model for a company selling the idea of having fewer employees.
2 replies →
Yeah, it's basically the opposite of how "product-led growth" SaaS works. Generally pay-as-you-go pricing is expensive at scale, but attractive initially. So you start on a pay-as-you-go plan, but as you scale you end up transitioning off pay-as-you-go to a negotiated commit. I.e. you call sales and sign a contract. Anthropic basically flips that around backwards.
I evaluated the pricing and could not justify the jump to Enterprise from Team. You lose the monthly subscription entirely when you jump to enterprise so you lose your ability to control costs.
You can cap per user, but not having the rolling cap are you really just going to tell a member of your team “No AI for the rest of the month”
It’s a risky deal as it sets up now IMO.
This Claudemaxxing phenomenon is amusing as hell.
I've been able to get by with the $20pm Pro subscription and reap great value out of Claude Code.
I feel like it really is about:
- Don't feed it the works of Shakespeare into the context window if all it's working on is a few files. I actually don't have a Claude.md file in my projects.
- I write the prompt as if I was giving instructions to another developer or to myself on how I want to approach a specific coding, with a numbered step plan. I've actually been able to take the details written into a Jira ticket on a work project, feed it into Clade Code, and get really good results from it.
- If you are responsible for the output, then you need to review the output - that does put a natural constraint on the tool's usage, but ultimately it is you who uses the tool, not the other way around.
I feel like that's the thing - you have to find the right cadence, just like with running or driving a car - you need to find the level at which you control the car, at which you maintain a consistent pace, and at which you get code that does what you need it to do and meets the quality threshold you want.
Can these AI-generated articles not be prompted to at least cite the primary sources? How do I know any of this is true?
Here's a much better article: https://aimagazine.com/news/why-uber-has-already-burned-thro...
The OP isn't a good article, but this one is about an entirely different subject?
Ah sorry, it's one of those annoying websites that automatically load another article when you scroll down too far. Updated the link.
Have we reached a point yet where companies are spending millions a year on software licenses, cloud and AI to the point where the return isn't worth it?
Years ago I did work for a company that was spending over a million on Oracle product licenses and I was part of the consultant team they hired to rip it all out and just go for simple maintainable code based on open source products. Not only did it transform into a codebase that the average newly hired developer could maintain, you also had the savings of not paying Oracle a significant portion of your revenue.
I feel like that will repeat itself in a few years time with the current cloud and AI train everyone is on.
I haven't been in a professional setting for a while, I just code for fun nowadays so perhaps I'm somewhat out of the loop.
> Uber's unexpected budget burn matters because it signals how valuable AI tools have become to engineering productivity.
This infers value from spend, which makes no sense. Burning the budget tells us engineers like the tool, not that it's producing value.
Show me how to make two dollars whilst spending one, and budget isn't a problem.
It's obvious that the word productivity has been used in this discussion to mean something other than the plain meaning of the word. If AI was productive, there would be no question about whether it could be afforded. If you're asking whether you can afford it then it isn't productive by definition.
They are using it to mean a mechanism that produces prodigious amounts of toxic waste. That does not conform to the historical understanding of the word.
> Monthly API costs per engineer ranged from $500 to $2,000 as adoption skyrocketed across the company.
That's...not exactly a lot per engineer. It sounds like they just didn't budget correctly. Especially if the net of that work is more features that would have otherwise required hiring more engineers, which would cost a lot more than $500 to $2000 a month.
Its a lot. Its a lot for being able to generate that many tokens.
And i'm not talking about some genies 10x developer who is working with multiply git worktrees on x tasks in parallel in high quality
No, it's really not a lot at all, especially if you've got a mandate to maximize your AI usage, which many engineering orgs have right now. I burned $216 USD using Claude Code in March just doing some casual development on the side and certainly not as a part of any professional workplace mandate.
1 reply →
Wonderful, so when will I see novel features in my Uber app?
if you mean novel bugs than probably at the next app update
Hahaha.. good one :D
You can now reportedly book a hotel from the Uber app...which is totally a useful feature that I'm sure everyone will start to use /s
https://investor.uber.com/news-events/news/press-release-det...
I didn't know this. There's a term for this--which everyone of us now know--enshitification.
2 replies →
Relevant Pragmatic Engineer newsletter with many more cases along these lines, along with how some people are handling them: https://newsletter.pragmaticengineer.com/p/the-pulse-token-s...
Tokenmaxxing seems more and more like a way to encourage experimentation and learning, and incidents like this are a part of learning. Like, today devs simply use the most expensive model by default, even to do extremely simple things. This is obviously wasteful and costly, and budgets will soon be imposed, but this is how they're figuring out the economics.
For instance, like we estimate story points, we may estimate token budgets. At that point, why waste time and money invoking a model for a simple refactor when you could do it with a few keystrokes in an IDE? And why use a frontier model when an open-source local model could spit out that throwaway script? Local models can be tokenmaxxed, but frontier models will still be needed and will be used judiciously. Those are essentially trade-offs, and will eventually be empirically driven, which is what engineering is largely about.
So economics will soon push engineers back to do what they're paid to do: engineering. Just that it will look very different compared to what we're used to.
This is the first time I heard about estimating tokens for a task.
I feel like you’re on to something. Management will pick this up, and make it part of the sprint planning.
Engineers will pull out their hair wondering how you can do that.
That’s like estimating how many CPU cycles a task will take. How many instructions will your laptop use while you work on something.
Yeah, and I expect estimating token budgets is going to go the same trajectory (along with the same accompanying annoyances) as estimating and tracking story points!
But done with the right mindset and proper awareness of the inherent uncertainty, you can sometimes achieve some reasonable estimates over time by starting with some T-shirt size estimates and then adapting based on actual numbers. Soon enough the team gets a sense of the nuances of the projects and its dependencies, and estimates get more accurate.
As such, the example of estimating CPU cycles for tasks is actually relevant. For instance it is a common practice in real-time embedded systems running on tiny micro-controllers. But it is also possible to get good estimates for more complex applications / OS's / architectures simply by benchmarking them over time.
The most common problem with planning and task estimation is that the corporate dynamics around it are not healthy: leadership often uses those as an SLA instead of the SWAG that they are. I worked on a team where our estimates never matched the actual time taken, partially due to rather unpredictable dependencies and high-priority tasks frequently interrupting us. But because we were clearly very high-functioning, management never held that against us. Those were some healthy corporate dynamics; not all places have that.
While this is a fundamentally stupid story to begin with, it was at least reported somewhat better in other venues. The original report came from The Information, and at least this Yahoo Finance[0] writeup mentioned that. This article has very little content and no sourcing.
[0]: https://finance.yahoo.com/sectors/technology/articles/ubers-...
It's wild that the article frames this as
> what started as an experiment in productivity became a runaway success
and
> figuring out if the company can afford this level of productivity at scale
It seems like they're equating "developers are spending a ton of money on this" with "this is creating a ton of value".
I'm not saying that AI tools aren't valuable, but the article doesn't question this equivalence at all.
Bizarrely I feel like that reflects how a lot of tech leadership are viewing it? I can't explain this behavior but this is the first time I've seen this inversion: leaders believing money spent on something is itself value. I have dev friends who are legitimately under an edict to burn more tokens! It's freakish.
> Uber's unexpected budget burn matters because it signals how valuable AI tools have become to engineering productivity
That's a bit of a logical leap with no demonstrable increase in productivity.
All this shows is that they're spending a lot more on AI than they budgeted for. Nothing else.
I think the tech industry in general is taking advantage of the fact that software productivity is hard to quantify to say whatever they want about their AI productivity gains. Apparently we are past the point of having to justify anything and can just equivocate increased AI spend with success.
Could be negative! All it shows is that Uber is probably incentivizing token usage just like so many other companies are.
You get what you measure.
If they burned through their ML budget in four months while using heavily subsidized models, we're going to see companies burn through their ML budgets in less than a week once those subsidies are no longer in place and they have to pay per tokens used.....
I didn't see the article mentioning the outcomes achieved because of using AI compared to not using AI. I might be missing it. Mainly, Uber is a business. So profit & loss - both need to be measured to understand the equation.
I wonder how much of this AI budget was spent on their LLM-heavy CI/CD pipeline: https://www.uber.com/us/en/blog/ureview/
I'm considering rolling out something similar but am not sure if it would exceed the expenses of Claude Code Review at an estimated $20 per PR.
AI might not make engineering cheaper — just more elastic. Instead of paying for engineers, you’re effectively paying per unit of thinking. At scale, that could get very expensive very quickly.
I spend $20/month on Gemini Pro and it greatly increased my productivity. I'm still in charge and only use AI for the more tedious or toughest problems. I can't see how these people could be spending this much productively.
I don't know, maybe this will make companies see the actual value in their engineering team. In my company they are starting to see the rotten fruits of the AI push, but it's come at the cost of many jobs, little planning and big ideas.
Exactly how Anthropic, OpenAI and co are selling it.
In the Uber Eats app I can't even request a refund for an incorrect order anymore, because the UI doesn't allow me to scroll down to the "submit" button.
It's been like this for months. I finally got my explanation.
I didn't see a bit where they said how this transformed into more productivity and more profit? What is the point in using AI to make developers more productive if you don't either have more features coded making more money, or fewer developers saving cost?
I am confused - what did they ship based on this spending? - it is totally alright to spend that money if it made significant progress in some area.
or did the engineers just chill and let claude take over daily duties? (this is also a benefit for employees in my opinion)
> the AI coding tools represent a meaningful chunk that nobody expected would require this much capital so quickly
Surprised Pikachu moment.
And it's going to become even more expensive when AI companies start charging to actually make a profit.
Interesting. Some companies have rolled it out to every department with a small budget.
I wonder how this will end as AI becomes more expensive to use. If you can't quantify ROI then I guess you're cooked.
Not surprising, hit my 5h limit on Claude Code Max Plan, had some credits so switched to extended (api). 40 minutes later $30 credits gone... so yeah, I can see how this can happen.
I use a cli tool to build a document of all relevant code and then use ChatGPT 5.5 pro to plan a feature and generate an implementation plan, and then review and edit and paste it into codex on high to implement.
And it works because it won’t stop until the rust compiles. But the code is garbage and makes bad decisions that no junior would. Unmaintainable junk and sometimes I spend more time refactoring than if I would of just built it myself.
People here talking about generating 100ks LoC a month and I’m wondering if it’s a skill issue with me, or Codex or if I should pull all my investments out of companies heavily invested in AI like uber.
AI coding tools probably need the same boring governance as cloud spend: budgets, alerts, team-level visibility, and a way to spot runaway usage before finance notices.
Wonder how many tokens would be saved if everyone just put “be brief” in their prompts.
Also wonder if there is some perverse incentive for models to be verbose to juice tokens.
The more I use Claude Code the harder it is for me to believe this behavior is a byproduct of the model. Behavior = ridiculously token inefficiencies
> what started as an experiment in productivity became a runaway success
Successfully burning through cash and tokens, alright, but what have they gotten out of it?
It's GPT 5.5 and it still can't do exactly the same thing I want. So, I think companies should call AI a lost cause.
we run an agentic pipeline in a different domain (data sourcing) and the only way the math works is to be ruthless about which stages actually need which model.
As a founder, the question I always have is "what is the marginal value per token relative to engineer-hours saved." More of a gut feel at the moment, but would be great to calculate.
They could have bought all their engines their own massive GPU. They could have built out their own DC. Nuts...
this is pointless without knowing what they are measuring. you could genuinely moving faster or you could be optimizing for engineers in a rat race to push more code because all their peers are now doing it because those are the metrics you are measuring for "ai productivity".
There's a line where the unfettered spending is just wasteful, we are well past the line
Might as well get while the getting is good and Anthropic is subsidizing the cost of compute
Honest question, does Uber need that much R&D? And do they expect the ROI to be positive?
i assume this also includes their self driving vehicle research and trucking, not just their consumer mobile app dev
Uber cancelled their self-driving research years ago.
Imagine making your product compliant across 100+ countries while regulatiions, labor-laws, tax rules, insurance requirements, and data privacy laws keep changing.
Imagine itegrating dozens of payment methods - many of them highly localized - across emerging and developed markets, while dealing with fraud, chargebacks, KYC, AML, and settlement complexities.
Imagine processing trillions of data points every day - rides, location updates, pricing signals, ETAs, traffic conditions, demand forecasts, payments, support events.... storing it efficiently, querying it in near real time, generating reports, and keeping the whole pipeline reliable. I have woorked in data engineering, and can tell you confidently that this alone requires an enormous R&d budget.
Then there are the apps - not just customer-facing, but driver-facing, courier-facing, merchant-facing, fleet-management, onboarding, support, operations, compliance, finance, and hundreds of internal tools and dashboards.
Then come the integrations. Companies running at Uber's scale genemrally have hundreds of tjese - mapping providers, payment processors, banks, identity verification, tax systems, telecoms, customer support platforms, fraud detection, analytics, ERP, CRM, and more.
... And then there are even more...
Real-time routing and dispatch optimization
Dynamic pricing and marketplace balancing
Fraud detection and account security
Driver/rider safety systems
ML models for ETA, demand forecasting, incentives, and churn prevention
Experimentation infrastructure for thousands of A/B tests
Reliability engineering across globally distributed systems
Data centers / cloud optimization at massive scale
Localization across languages, currencies, addresses, and cultural norms
Customer support automation at global scale
Autonomous vehicle research, mapping, and computer vision
... to be fair, this is all what I could thing of based on my own work experience in related fields... there is definitely as many more systems in reality as mentioned abpve.
This continues to boggle my mind so hopefully somebody can explain how this is happening.
I’ve been using all these tools since they started popping out around 2021 personally and professionally. I probably built four or five products at this point with assistance, not to mention the thousands and thousands of back-and-forth conversations for research or search or rubber ducking or whatever.
I have never spent more than whatever the professional max plan is that is consistently $20 a month.
I asked a friend of mine who spent a couple hundred dollars in like an few hours how they did it. The answer was they basically getting these agent groups of agents stuck in a loop and they’re constantly just generating verbose bullshit that is not even interrogated and doesn’t come out with any artifact that is inspectable no matter how expert you are.
The couple of stories I have heard of these massive crazy spends are people literally just assuming these things can complete an entire human task in one shot, so they continue to hit the “spin the wheel” button until they get something closer to what they want
But I’ve yet to see that actually work
and it actually flies in the face of every instruction guide or documentation or prompt engineering process that has been described over the last almost 5 years
This terrible unsourced article seems to be citing this information piece: https://www.theinformation.com/newsletters/applied-ai/uber-c...
... but the key fact about "$500-$2000" per engineer does not appear there, and seems to be fabricated.
Thank you for the link.
Most people don't have the team and time to do heavy token efficiency engineering. But that's all we do. marketplace.neurometric.ai has a bunch of task specific small models, and we charge flat monthly fees. We bear the token risk.
There is a major disconnect in that people think token usage is exclusively tied to human typing rates...it isn't true. When software developers evolve to using self-managing CLI tools (like Claude Code - the source article mentions this), they are not merely chatting; they are unleashing loops of agency.
When you enter one single inquiry of "find and fix the memory leak in the billing service" you are not submitting just one single inquiry. The tool is searching through an entire code repository for relevant code, pulling 15 related files into context (easily 200k+ tokens) proposing a fix, running the test suite and failing, taking an entire stack trace of errors into context and looping to keep iterating towards the solution.. In that process you can loop multiple times (10+) in a very short period of times (within 5 minutes). While you grab a cup of coffee you will have consumed $20 in token usage. At the enterprise level (like with Uber) when you multiply that out by thousands of software developers using it as a personal shell tool your budget disappears very very quickly.
And on your point about the junior developer: Comparing $100,000/year in tokens to hiring a junior developer is such a ridiculous false equivalency that even makes you question whether they even understand how to make such a comparison.
The cost to a business of one junior engineer with a $100,000 salary is not just the $100,000 in salary but also an additional $40,000+ in benefits and taxes, as well as in hardware.
Also, you are disregarding another cost of hiring junior engineers that is their mentorship cost. Each week, your senior and staff engineers spend hours mentoring junior engineers by reviewing their code, pairing with them, and unblocking their progress. Mentoring requires a substantial amount of time and will be expensive to your business.
The return on investment (ROI) for the $10,000 monthly expenditure on tokens is not so much about replacing the junior engineer with the AI. Instead, the ROI is that your senior engineers can use the huge amount of compute power to create boilerplate and tests, and refactor their code 3x quicker than if they had to mentor junior engineers. In addition, LLMs do not sleep, require one-on-ones, or leave for another company for 20% more pay in 18 months, when the value to the code base made them an asset to your business.
Lastly, the main reason that Uber has problems with their AI business is that due to the UX of these agentic tools, developers think of the API calls made to the AI as free and as a result, treat them like a basic grep command.
What are the sources for the “facts” presented in this post?
No mention of if it actually improved outcomes.
That's not the point silly goose.
Is this a submarine? https://paulgraham.com/submarine.html
It's funny how Paul is recommending people use PR firms, while in more recent videos michael seibel and others have strongly recommended against using them. It's interesting how things shift in ~20 years
But did it make them more productive?
Oh it does... but what happens after 6 months is an entirely different story.
A codebase that has exploded in size 2-3 times in just a few months,... internal architecture that is not layers of simple parts anymore, but, layers of complex architectures corresponding to individual agentic runs,... a codebase that now has 10 times more if-else and individual codepaths because you were not clear enough in your requirements, and used the phrase "handle all cases",... a codebase that neither you, nor anyone else now understands properly, thus, can't comment on what's possible anymore, and and at what costs when your manager or PM asks,...and finally, due to combined effect of these, a need for an ever increasing token budget, and constantly increasing fragilty of new AI-generated code due to repeated context compactions.
And we haven't even touched on the security and performance elements yet.
The right way to use these tools is to use them as, what I like to call, "code-monkeys". You tell them exactly what you want, where you want, how to do it, and how to architecture it, and more.. and then make them code.
There are no sources or references.
> When developer productivity tools become so valuable that engineers blow the entire budget in four months, the issue isn't the tool but that the budget was invented too early to forecast this adoption curve.
Where oh where can I find clients like these??
And here comes the reining in of spending. If companies are anywhere like I'm seeing:
1 - Company mandate, start using AI
2 - You're afraid? Here's a mandate!
3 - (Devs and others discover Claude Code features where the coolest burn mad tokens)
4 - Um, yeah we're going to have to take a look at the spend here
5 _
What's 5?
We know steps 3 and 4 will cycle a bit more, and we know it's going to cost more - these were startup teaser costs.
This doesn’t work at all
i bet someone mentioned openclaw one too many times
Uber must be the biggest tech company that got lucky with timing. They are so incredible stupid and incompetent. How on earth do you end up with that cost for AI per user.
I don’t understand. On the ChatGPT pro plan for $200/month, I am essentially running it 24/7 including nights and I can barely get it under the 40% usage mark. Why are companies not using this?
when performance means using AI. is easy to make it happen
My company has an all you can eat policy, but I think we'd be well served by being thoughtful in optimizing usage so that we still have the overall capabilities but don't burn extra tokens by sloppy use.
AI token austerity when
Now AI slop factories make the HN front page?
dead internet, mon frère.
[dead]
[flagged]
[dead]
[dead]
[dead]
> 70% of committed code originating from AI.
How are they calculating that? They could be using my tool, Buildermark, but I do t think they are: https://buildermark.dev