← Back to context

Comment by paulryanrogers

16 hours ago

How? Are you thinking of adversarial AI reviewers, runtime tests (also by AI), or something else?

Guess I just don't see how you can take the human out of the loop and replace them with non-deterministic AIs and informal prompts / specs.

Humans are also non-deterministic, though. Why does replacing one non-deterministic actor with another matter here?

I'm not particularly swayed by arguments of consciousness, whether AI is currently capable of "thinking", etc. Those may matter right now... but how long will they continue to matter for the vast majority of use cases?

Generally speaking, my feeling is that most code doesn't need to be carefully-crafted. We have error budgets for a reason, and AI is just shifting how we allocate them. It's only in certain roles where small mistakes can end your company - think hedge funds, aerospace, etc. - where there's safety in the non-determinism argument. And I say this as someone who is not in one of those roles. I don't think my job is safe for more than a couple of years at this point.

  • It has nothing to do with whether small mistakes are allowable or not. It’s about customers needing a consistent product.

    The in-code tests and the expectations/assumptions about the product that your users have are wildly different. If you allow agents to make changes restricted only by those tests, they’re going to constantly make changes that break customer workflows and cause noticeable jank.

    Right now agents do this at a rate far higher than humans. This is empirically demonstrable by the fact that an agent requires tests to keep from spinning out of control when writing more than a few thousand lines and a human does not. A human is capable of writing tens of thousands as of lines with no tests, using only reason and judgement. An agent is not.

    They clearly lack the full capability of human reason, judgment, taste, and agency.

    My suspicion is that something close enough to AGI that it can essentially do all white dollar jobs is required to solve this.

  • > Generally speaking, my feeling is that most code doesn't need to be carefully-crafted. We have error budgets for a reason, and AI is just shifting how we allocate them. It's only in certain roles where small mistakes can end your company - think hedge funds, aerospace, etc. - where there's safety in the non-determinism argument.

    That's a bit shortsighted. There have been cries of software becoming needlessly bloated and inefficient since computers have existed (Wirth, of course, but countless others too). Do you visit any gamer communities? They are constantly blaming careless waste of resources and lack of optimization in games for many AAA games performing badly in even state of the art hardware, or constantly requiring you to upgrade your gaming rig.

    I don't think the only scenario is boring CRUD or line of business software, where indeed performance often doesn't matter, and most of it can now be written by an AI.

    • Even in CRUD line of business software, lack of performance causes enormous problems that the current software development culture glosses over.

      Just one example I've seen time and again. You take an application that if optimized could run on a single server (maybe 2 if you absolutely have to have zero downtime deployments), but because no one cares about performance it runs on 10 or more. You know have a complexity avalanche that rapidly blows up. Then you need more hierarchy to handle the additional organizational complexity etc...

      Then people start breaking out pieces of the app so they can scale them separately and before long you're looking at 200 engineers to do a job that certainly doesn't need that many people.

      I realize I'm ignoring a whole lot of other issues that result in this kind of complexity, but lack of performance contributes to this a lot more than people want to admit.

> adversarial AI reviewers, runtime tests (also by AI), or something else?

And spec management, change previews, feedback capture at runtime, skill libraries, project scaffolding, task scoping analysis, etc.

Right now this stuff is all rudimentary, DIY, or non-existent. As the more effective ways to use LLMs becomes clearer I expect we'll see far more polished, tightly-integrated tooling built to use LLMs in those ways.

  • Agents require tests to keep from spinning out of control when writing more than a few thousand lines, but we know that tests are wildly insufficient to describe the state of the actual code.

    You are essentially saying that we should develop other methods of capturing the state of the program to prevent unintended changes.

    However there’s no reason to believe that these other systems will be any easier to reason about than the code itself. If we had these other methods of ensuring that observerable behavior doesn’t change and they were substantially easier than reasoning about the code directly, they would be very useful for human developers as well.

    The fact that we’ve not developed something like this in 75 years of writing programs, says it’s probably not as easy as you’re making it out.