Comment by meander_water
6 days ago
I suspect a large proportion of claims made for productivity increases are skewed by the fact that the speed at which code is produced by AI makes you _feel_ productive, but these gains are largely replaced by the effort to understand, refactor, review and clean up the code. The high that you get when something "works" tends to stick more in your memory than the time when you had to spend a day cleaning up dead code, refactoring 2k line modules into a more readable project structure etc.
I'm not saying that AI can't make you productive, it's just that these claims are really hard to verify. Even the recently posted Cloudflare OAuth worker codebase took ~3 months to release (8 Mar - 20 May), producing a single file with >2k lines. Is that going to be harder to maintain than a codebase with a proper project structure that's easily parseable by a human?
> Even the recently posted Cloudflare OAuth worker codebase took ~3 months to release (8 Mar - 20 May)
This is incorrect. The library was part of the MCP framework we launched on March 25 -- the same month development began:
https://blog.cloudflare.com/remote-model-context-protocol-se...
Indeed the speed with which we were able to turn this around was critical to us, as it allowed us to have our Remote MCP framework ready immediately when the spec was finalized, which led to quite a few companies building MCP servers on Cloudflare: https://blog.cloudflare.com/mcp-demo-day/
I'm not an AI maximalist. I still write lots of code by hand, because there's a lot AI isn't good at. It's good at boilerplate and straightforward code, it's bad at refactoring deep systems. But AI assistance was undeniably a huge win for the OAuth project. There's no way I could have written that library by hand so quickly. (Maybe when I was 25 and had no responsibilities, but these days I have like 1 solid day a week to actually write code...)
First commit: Feb 27th 2025
commit 3b2ae809e9256d292079bb15ea9fe49439a0779c Author: Kenton Varda <kenton@cloudflare.com> Date: Thu Feb 27 17:04:12 2025 -0600
Fine, sorry, apparently both of meander_water's dates were incorrect, and I actually started the work two days before March. It was still less than a month from there to release, though.
Apologies, I didn't mean to misrepresent your work. Big fan of your work by the way, I was a happy user of sandstorm.io back in the day.
Ok sorry to get abstract but to me what you are talking about is differentiating between understanding and correctness. We as humans, for now, need to understand the code and that's not easily transmitted from the output of some AI. In fact, that's a hard problem. But I don't think it's impossible for AI to assist humans with that. The AI could help walk humans through the code to understand quickly what's going on. Maybe ultimately the issue here is trust. Do we trust the AI to write code. Maybe we spend more time trying to verify it for now. I think that shows we place a lot of trust in humans to write code. Maybe that changes.
This is cope. I know my own experience, and I know the backgrounds and problem domains of the friends I'm talking to that do this stuff better than I do. The productivity gains are real. They're also intuitive: if you can't look at a work week and spot huge fractions of work that you're doing that isn't fundamentally discerning or creative, but rather muscle-memory rote repetition of best practices you've honed over your career, you're not trying (or you haven't built that muscle memory yet). What's happening is skeptics can't believe that an LLM plus a couple hundred lines of Python agent code can capture and replicate most of the rote work, freeing all that time back up.
Another thing I think people are missing is that serious LLM-using coders aren't expecting 100% success on prompts, or anything close to it. One of the skills you (rapidly) develop is the intuition for when to stop a runaway agent.
If an intern spun off hopelessly on a task, it'd be somewhat problematic, because there are finite intern hours and they're expensive. But failed agent prompts are nickel-denominated.
We had a post on the front page last week about someone doing vulnerability research with an LLM. They isolated some target code and wrote a prompt. Then they ran it one hundred times (preemptively!) and sifted the output. That approach finds new kernel vulnerabilities!
Ordinary developers won't do anything like that, but they will get used to the idea of only 2/3 of prompts ending up with something they merge.
Another problem I think a lot of skeptics are running into: stop sitting there staring at the chain of thought logs.
The thing I don't understand is that you keep bringing up your friends' experience in all your responses and in the blog itself. What about your experience and your success rate and productivity gain that you observed with AI agent? It feels like you yourselves aren't confident on your gain and must bring up second hand experience from your friends to prop up your arguments
> if you can't look at a work week and spot huge fractions of work that you're doing that isn't fundamentally discerning or creative, but rather muscle-memory rote repetition of best practices you've honed over your career, you're not trying (or you haven't built that muscle memory yet). What's happening is skeptics can't believe that an LLM plus a couple hundred lines of Python agent code can capture and replicate most of the rote work, freeing all that time back up.
No senior-level engineer worth their salt, and in any kind of minimally effective organization, is spending any meaningful amount of their time doing the rote repetition stuff you're describing here. If this is your experience of work then let me say to you very clearly: your experience is pathological and non-representative and you need to seek better employment :)
Regardless of what people say to you about this, most (all?) undergraduates in CS programs are using LLMs. It's extremely pervasive. Even people with no formal training are using AI and vercel and churning out apps over the weekend. Even if people find reasons to dislike AI code writing, culturally, it's the future. I don't see that changing. So either a huge percent of people writing code are doing it all wrong or times are changing.
Just a data point:
I think it has a lot to do with the type of work you are doing. I am a couple of years into a very small startup that has some actual technology built (as opposed to a really simple CRUD app or something).
When I am working on the front-end where things are pretty simple AI is a huge speed up. What it does VERY well it latch on to patterns and then apply those patterns to other things. If it has a couple of examples you can point it to and say "ok build that but over here" the newest revisions of Claude and Gemini are perfectly capable of building the whole thing end to end. Because it's a fairly repetitive task I don't have to spend much time untangling it. I can review it and pattern match against things that don't look right and then dive into those.
For a real example, I needed a page for a user to manually add a vendor in our platform. A simple prompt asking Claude to add a button to the page sent into a mode where it added the button, built the backend handler, added the security checks, defined a form, built another handler to handle the submitted data, and added it to the database. It even wrote the ACL correctly. The errors it introduced were largely around using vanilla HTML in place of our standard components and some small issues with how it attempted to write to the DB using our DB library. This saved me a couple of hours of typing.
Additionally if I need to refactor something AI is a godsend. Just today an underlying query builder completely changed its API and broke..everything. Once I identified how I wanted to handle the changes and wrote some utilities I was able to have Claude just find everything everywhere and make those same changes. It did it with like 90% accuracy. Once again that saved me a couple of hours.
Where it fails, usually spectacularly, is when we get to the stuff that is new or really complex. If it doesn't have patterns to latch onto it tries to invent them itself and the code is garbage. Rarely does it work. Attempting to vibe code it with increasingly more pointed prompts will often result in compiling code but almost never will it do the thing I actually wanted.
In these contexts it's usefulness is mostly things like "write a sql query to do X" which occasionally surfaces a technique I hadn't thought about.
So my experience is pretty mixed. I am definitely saving time. Most of it is typing time not thinking time. Which is like 1/3 of my average day. If I had to guess I am somewhere in the neighborhood of 30-40% faster today than I was in 2019. Notably that speed up has allowed me to really stretch this funding round as we are well past the phase where we would have typically hired people in my past companies. Usually someone relatively mid-level to take over those repetitive tasks.
Instead it's just me and a non-technical founder going along super quickly. We will likely be at a seed round before anyone new comes in.