Comment by rfw300

4 hours ago

Super interesting study. One curious thing I've noticed is that coding agents tend to increase the code complexity of a project, but simultaneously massively reduce the cost of that code complexity.

If a module becomes unsustainably complex, I can ask Claude questions about it, have it write tests and scripts that empirically demonstrate the code's behavior, and worse comes to worst, rip out that code entirely and replace it with something better in a fraction of the time it used to take.

That's not to say complexity isn't bad anymore—the paper's findings on diminishing returns on velocity seem well-grounded and plausible. But while the newest (post-Nov. 2025) models often make inadvisable design decisions, they rarely do things that are outright wrong or hallucinated anymore. That makes them much more useful for cleaning up old messes.

14 comments

rfw300

joshribakoff 4 hours ago

Bad code has real world consequences. Its not limited to having to rewrite it. The cost might also include sanctions, lost users, attrition, and other negative consequences you don’t just measure in dev hours

SR2Z 4 hours ago
Right, but that cost is also incurred by human-written code that happens to have bugs.
In theory experienced humans introduce less bugs. That sounds reasonable and believable, but anyone who's ever been paid to write software knows that finding reliable humans is not an easy task unless you're at a large established company.
- MeetingsBrowser 4 hours ago
  
  The question then becomes, can LLMs generate code close to the same quality as professionals.
  In my experience, they are not even close.
  
  3 replies →
- verdverm 4 hours ago
  
  There was a recent study posted here that showed AI introduces regressions at an alarming rate, all but one above 50%, which indicates they spend a lot of time fixing their own mistakes. You've probably seen them doing this kind of thing, making one change that breaks another, going and adjusting that thing, not realizing that's making things worse.

MeetingsBrowser 4 hours ago

This only helps if you notice the code is bad. Especially in overlay complex code, you have to really be paying attention to notice when a subtle invariant is broken, edge case missed, etc.

Its the same reason a junior + senior engineer is about as fast as a senior + 100 junior engineers. The senior's review time becomes the bottleneck and does not scale.

And even with the latest models and tooling, the quality of the code is below what I expect from a junior. But you sure can get it fast.

phillipclapham 2 hours ago

This is the most important point in the thread. The study measures code complexity but the REAL bottleneck is cognitive load (and drain) on the reviewer.
I've been doing 10-12 hour days paired with Claude for months. The velocity gains are absolutely real, I am shipping things I would have never attempted solo before AI and shipping them faster then ever. BUT the cognitive cost of reviewing AI output is significantly higher than reviewing human code. It's verbose, plausible-looking, and wrong in ways that require sustained deep attention to catch.
The study found "transient velocity increase" followed by "persistent complexity increase." That matches exactly. The speed feels incredible at first, then the review burden compounds and you're spending more time verifying than you saved generating.
The fix isn't "apply traditional methods" — it's recognizing that AI shifts the bottleneck from production to verification, and that verification under sustained cognitive load degrades in ways nobody's measuring yet. I think I've found some fixes to help me personally with this and for me velocity is still high, but only time will tell if this remains true for long.

AlexandrB 2 hours ago

> Super interesting study. One curious thing I've noticed is that coding agents tend to increase the code complexity of a project, but simultaneously massively reduce the cost of that code complexity.

This is the same pattern I observed with IDEs. Autocomplete and being able to jump to a definition means spaghetti code can be successfully navigated so there's no "natural" barrier to writing spaghetti code.

i_love_retros 2 hours ago

> have it write tests

Just make sure it hasn't mocked so many things that nothing is actually being tested. Which I've witnessed.

moregrist 2 hours ago
I’ve also seen Opus 4.5 and 4.6 churn out tons of essentially meaningless tests, including ones where it sets a field on a structure and then tests that the field was set.
You have to actually care about quality with these power saws or you end up with poorly-fitting cabinets and might even lose a thumb in the process.
- teaearlgraycold 1 hour ago
  
  The first thing you should do after having them write tests is delete half of the tests.