Comment by Maxatar

8 days ago

There's nothing shocking about this. The vast majority of software/source code is pretty terrible anyways, code that is full of bugs, slow to use, has little to no automated tests and very hard to maintain.

To the extent that it gets fixed or works at all, it's not because of competent developers doing rigorous analysis of the software, it's because either someone testing it or using it gets annoyed, reports an issue, and then that specific issue gets patched out.

If using LLMs to perform a similar function shocks you, then you should have been shocked already by the proliferation of pretty bad software for the better part of the last couple of decades.

So many criticisms of LLMs assume that people have been writing software very diligently, applying a high standard of engineering, subjecting the code to a battery of rigorous tests, passing it through a strict review process... and that does happen for some software, especially software that is commonly used, but it's not true for the vast majority of software developed.

AI is no good, but neither are people, isn’t a great sales pitch.

I think for small tools that people want to make for themselves, that’s great. Where I see a problems are when other people and money get involved. If something goes wrong, who is accountable? Claude wrote it, Claude reviewed it, Claude submitted the PR… yet Claude can’t have any real accountability.

  • "A computer can never be held accountable

    Therefore a computer must never make a management decision"

    -- Internal IBM training manual, 1979

  • I think small tools people make for themselves is realistically less than 1% of software produced. Most of the code, and - to the GP’s point - bad code, is produced in corporations with plenty of money and budget.

    There is just such a tremendous amount of waste at every company, in that the headcount and software expands to fill the budget. I’m not defending Elon, but look at how much he slashed from X (80% or so?) and the company still has its core product functioning and an active user base.

    There is a ton of software (especially internal) at essentially every company that also is low accountability before Claude. “Oh Ted built that but he’s working on a new important project. I understand it’s broken and that’s impacting you but we won’t be able to prioritize this until next quarter at least. Can you set up a meeting next month to discuss?”

    Honestly the outcome for all of these LLMs is indeed is likely a higher amount of software with no accountability, but it’s also an improved ability to juggle more of that software to the same (realistically low) standard.

  • It's an absolutely phenomenal sales pitch to executives. A ton of automation is sold on the basis that it's probably not going to be as good as having a dedicated person do it, but that automation leads to much lower maintenance scales better, is more deterministic and reproducible.

> little to no automated tests

I'm still amazed people don't achieve extremely high test quality, since you get tests "for free" now.

One of the limitations of testing were always that people "design" things so they're hard to test.

And then they argue "This can't be tested", or "Refactoring this for testing is not worth it."

It is now. Yet, I work on codebases with no tests and lots of yolo co-authoring.

  • You get quantity of tests, but the tests are not good quality by default, at all.

    • I’m not sure how you can say something general about the quality of tests unless you mean by simply prompting “make tests” or similar.

      Yes, I’ve experienced that those tests succeed, and the app still breaks trivially on first run.

      What I mean is: you design the tests. You analyse patterns. You insist on making testable code (average code by humans isn’t, so neither is average code by LLMs unless you specify testability as a design constraint.

      One way to get testable code is to mock all interfaces. This is usually expensive, but not difficult for an AI, because you can set the success criterion to be interface exactness of your mock for a series of plausible and somewhat extensive interactions.

      The tests you can make with AI are as good as you can make them otherwise, you just save time doing them, which should justify making more extensive testing.