Comment by marxism
6 days ago
I think we're talking past each other. There's always been a threshold: above it, code changes are worth the effort; below it, they sit in backlog purgatory. AI tools so far seem to lower implementation costs, moving the threshold down so more backlog items become viable. The "5x productivity" crowd is excited about this expanded scope, while skeptics correctly note the highest value work hasn't fundamentally changed.
I think what's happening is two groups using "productivity" to mean completely different things: "I can implement 5x more code changes" vs "I generate 5x more business value." Both experiences are real, but they're not the same thing.
https://peoplesgrocers.com/en/writing/ai-productivity-parado...
My friends at companies where AI tools are either mandated or heavily encouraged report that they're seeing a significant rise in low-quality PRs that need to be carefully read and rejected.
A big part of my skepticism is this offloading of responsibility: you can use an AI tool to write large quantities of shitty code and make yourself look superficially productive at the cost of the reviewer. I don't want to review 13 PRs, all of which are secretly AI but pretend to be junior dev output, none of which solve any of the most pressing business problems because they're just pointless noise from the bowels of our backlog, and have that be my day's work.
Such gatekeeping is a distraction from my actual job, which is to turn vague problem descriptions into an actionable spec by wrangling with the business and doing research, and then fix them. The wrangling sees a 0% boost from AI, the research is only sped up slightly, and yeah, maybe the "fixing problems" part of the job will be faster! That's only a fraction of the average day for me, though. If an LLM makes the code I need to review worse, or if it makes people spend time on the kind of busywork that ended up 500 items down in our backlog instead of looking for more impactful tasks, then it's a net negative.
I think what you're missing is the risk, real or imagined, of AI generating 5x more code changes that have overall negative business value. Code's a liability. Changes to it are a risk.
This is exactly what I’ve experienced. For the top-end high-complexity work I’m responsible for, it often takes a lot more effort and research to write a granular, comprehensive product spec for the LLM than it does to just jump in and do it myself.
On the flip side, it has allowed me to accomplish many lower-complexity backlog projects that I just wouldn’t have even attempted before. It expands productivity on the low end.
I’ve also used it many times to take on quality-of-life tasks that just would have been skipped before (like wrapping utility scripts in a helpful, documented command-line tool).
This also accounts for the author of TFA's sense that the smartest people they know are skeptics. Assuming they're being used well, those people spend far more of their time in the high complexity work than they do in the low complexity stuff, so LLMs seem to be more flashy toys than serious tools to them.
That, or higher level in bigger orgs. I’m fairly senior, but I’m also one of two engineers at a startup. Can’t get away from the low-level work.
1 reply →
> On the flip side, it has allowed me to accomplish many lower-complexity backlog projects that I just wouldn’t have even attempted before
This has been my experience at well - AI coding tools are like a very persistent junior-- that loves reading specs and documentation. The problem for AI companies is "automated burndown of your low-complexity backlog items" isn't a moneymaker, even though that's what we have. So they have to sell a dream that may be realized, or may not.
The benchmark project in the article is the perfect candidate for AI: well defined requirements with precise technical terms (RFCs), little room for undefined behavior and tons of reference implementations. This is an atypical project. I am confident AI agent write an HTTP2 server, but it will also repeatedly fail to write sensible tests for human/business processes that a junior would excel at.
I'm currently still somewhat in the AI skeptic camp but you've intrigued me... I'm curious about taking a lesser-known RFC and trying to see what kind of implementation one of the current code-generating models actually comes up with from the spec.
I think this is actually a really good point. I was just recently thinking that LLMs are (amongst other things) great for streamlining these boring energy-draining items that "I just want done" and aren't particularly interesting, but at the same time they do very little to help us juggle more complex codebases right now.
Sure, they might help you onboard into a complex codebase, but that's about it.
They help in breadth, not depth, really. And to be clear, to me that's extremely helpful, cause working on "depth" is fun and invigorating, while working on "breadth" is more often than not a slog, which I'm happy to have Claude Code write up a draft for in 15 minutes, review, do a bunch of tweaks, and be done with.
+1 this breadth vs depth framing. I notice this in aider itself: What right does that project have to support all those command line options, covering every little detail, and all optionally via Env variables too, and/or yaml file, and .MD docs of them all up to date? Answer: aider itself was clearly used to write all that breadth of features.
You seem to think generating 5x more code results in better code, in the left column. I highly doubt this.
Yes there are huge unstated downsides to this approach if this is production code (which prototypes often become).
It depends?
There's certainly a lot of code that needs to be written in companies that is simple and straightforward and where LLMs are absolutely capable of generating code as good as your average junior/intermediate developer would have written.
And of course there are higher complexity tasks where the LLM will completely face plant.
So the smart company chooses carefully where to apply the LLM and possibly does get 5x more code that is "better" in the sense that there's 5x more straightforward tickets closed/shipped, which is better than if they had less tickets closed/shipped.
That wasn't the argument. The argument is that someone using an LLM to create 5x more code will achieve things like "Adding robust error handling" and "Cleaner abstractions".
1 reply →
I'm attempting to vibe code something for the first time. It seems to work, but the amount of cruft being generated is astounding. It's an interesting learning experience, anyways.
I agree 100%! It's amazing how few people grok this.
This reminds me of places that try and measure productivity by lines of code
> The "5x productivity" crowd is excited about this expanded scope, while skeptics correctly note the highest value work hasn't fundamentally changed.
This is true, LLMs can speed up development (some asterisks are required here, but that is generally true).
That said, I've seen, mainly here on HN, so many people hyping it up way beyond this. I've got into arguments here with people claiming it codes at "junior level". Which is an absurd level of bullshit.
Exactly. Juniors may have vastly less breadth of knowledge than an LLM, but they can learn and explore and test software in ways that LLMs cannot.
However, the expansion in scope that senior developers can tackle now will take away work that would ordinarily be given to juniors.
> And there are many things one junior could be helpful with that a different junior would be useless at.
That maybbe true, and would be an interesting topic do discuss if people actually spoke in such a way.
"Developers are now more productive in a way that many projects may need less developers to keep up productivity levels" is not that catchy to generate hype however.