Comment by raggi
3 days ago
They averaged producing 47% more code on the AI tasks, but took only 20% more time. The report here biases over these considerations, but I’m left wondering: was the extra code superfluous or did this produce better structure / managed debt better? If that extra 47% of code translates to lower debt and more consistent throughput over the long term, I might take it, given how crushed projects get from debt. Anyway, it’s all hyperbole because there are massive statistical differences in the outcomes but no measures as to what they mean, but I’m sure they have meaning. That meaning matters a ton.
> They averaged producing 47% more code on the AI tasks, but took only 20% more time. The report here biases over these considerations, but I’m left wondering: was the extra code superfluous or did this produce better structure / managed debt better? If that extra 47% of code translates to lower debt and more consistent throughput over the long term, I might take it, given how crushed projects get from debt.
Wouldn't it be the opposite? I'd expect the code would be 47% longer because it's worse and heavier in tech debt (e.g. code repeated in multiple places instead of being factored out into a function).
Honestly my experience from using AI to code (primarily claude sonnet) is that that "extra 47%" is probably itself mostly tech debt. Places where the AI repeated itself instead of using a loop. Places where the AI wrote tests that don't actually test anything. Places where the AI failed to produce a simple abstraction and instead just kept doing the same thing by hand. Etc.
AI isn't very good at being concise, in my experience. To the point of producing worse code. Which is a strange change from humans who might just have a habit of being too concise, but not by the same degree.
Your response implies the ai produced code was landed without review. That’s a possible outcome but I would hope it’s unlikely to account for the whole group at this scale. We’re of course still lacking data.
I very much doubt that when individual programmers are producing significantly more code with the help of AI that somehow the review process simultaneously scales up to perform adequate review of all of that extra code.
In my experience, review was inadequate back before we had AI spewing forth code of dubious quality. There's no reason to think it's any better now.
An actually-useful AI would be one that would make reviews better, do them itself, or at least help me get through reviews faster.
I have two responses to the "code review fixes these problems" argument.
One: The work to get code to a reviewable point is significant. Skipping it, either with or without AI, is just going to elongate the review process.
Two: The whole point of using AI is to outsource the thought to a machine that can think much faster than you can in order to ship faster. If the normal dev process was 6 hours to write and 2 hours to review, and the AI dev process was 1 hour to write and 8 hours to review, the author will say "hey why is review taking so long; this defeats the purpose". You can't say "code review fixes these problems" and then bristle at the necessary extra review.
Can we have a linter for both high verbosity/repetitiveness and high terseness? I know copy-paste detector and cognitive complexity calculator linters are related. I recently generated code that interleaved spreadsheet worksheets (multiple of them) and cell formatting boilerplate with querying data. I asked AI to put the boilerplate into another class and expose .write_balance_row() and it did it perfectly. If a tool reported it, huge changes dont have to reach human reviewers and AIs can iterate and pass the linter.
I have an extremist take on this:
All source code is technical debt. If you increase the amount of code, you increase the amount of debt. It's impossible to reduce debt with more code. The only way to reduce debt is by reducing code.
(and note that I'm not measuring code in bytes here; switching to single-character variable names would not reduce debt. I'm measuring it in statements, expressions, instructions; reducing those without reducing functionality decreases debt)
I'll try a counterargument. If more code is more technical debt then writing more succinct code is less technical debt. But succinct code is often harder to grok and maintain than code written for the average Joe dev. So less code can sometimes mean less maintainability and thus more technical debt.
I think you instead meant to say more business logic implemented in code is more technical debt, not necessarily just more code.
No, I really mean more code. It's an unpopular opinion I know, but I think debt scales linearly with code, mainly because I also think bugs scale linearly with code. I recognise that readability and maintainability are important, but it doesn't change the basic equivalence of code = debt for me.