Comment by gpm

3 days ago

Honestly my experience from using AI to code (primarily claude sonnet) is that that "extra 47%" is probably itself mostly tech debt. Places where the AI repeated itself instead of using a loop. Places where the AI wrote tests that don't actually test anything. Places where the AI failed to produce a simple abstraction and instead just kept doing the same thing by hand. Etc.

AI isn't very good at being concise, in my experience. To the point of producing worse code. Which is a strange change from humans who might just have a habit of being too concise, but not by the same degree.

Your response implies the ai produced code was landed without review. That’s a possible outcome but I would hope it’s unlikely to account for the whole group at this scale. We’re of course still lacking data.

  • I very much doubt that when individual programmers are producing significantly more code with the help of AI that somehow the review process simultaneously scales up to perform adequate review of all of that extra code.

    In my experience, review was inadequate back before we had AI spewing forth code of dubious quality. There's no reason to think it's any better now.

    An actually-useful AI would be one that would make reviews better, do them itself, or at least help me get through reviews faster.

  • I have two responses to the "code review fixes these problems" argument.

    One: The work to get code to a reviewable point is significant. Skipping it, either with or without AI, is just going to elongate the review process.

    Two: The whole point of using AI is to outsource the thought to a machine that can think much faster than you can in order to ship faster. If the normal dev process was 6 hours to write and 2 hours to review, and the AI dev process was 1 hour to write and 8 hours to review, the author will say "hey why is review taking so long; this defeats the purpose". You can't say "code review fixes these problems" and then bristle at the necessary extra review.

Can we have a linter for both high verbosity/repetitiveness and high terseness? I know copy-paste detector and cognitive complexity calculator linters are related. I recently generated code that interleaved spreadsheet worksheets (multiple of them) and cell formatting boilerplate with querying data. I asked AI to put the boilerplate into another class and expose .write_balance_row() and it did it perfectly. If a tool reported it, huge changes dont have to reach human reviewers and AIs can iterate and pass the linter.