Comment by feastingonslop

20 days ago

The code itself does not matter. If the tests pass, and the tests are good, then who cares? AI will be maintaining the code.

Next iterations of models will have to deal with that code, and it would be harder and harder to fix bugs and introduce features without triggering or introducing more defects.

Biological evolution overcomes this by running thousands and millions of variations in parallel, and letting the more defective ones to crash and die. In software ecosystems, we can't afford such a luxury.

An example: it had a complete interface to a hash map. The task was to delete elements. Instead of using the hash map API, it iterated through the entire underlying array to remove a single entry. The expected solution was O(1), but it implemented O(n). These decisions compound. The software may technically work, but the user experience suffers.

  • If you have particular performance requirements like that, then include them. Test for them. You still don’t have to actually look at the code. Either the software meets expectations or it doesn’t, and keep having AI work at it until you’re satisfied.

    • How deep do you want to go? Because reasonable person wouldn't have expected to hand hold AI(ntelligence) to that level. Of course after pointing it out, it has corrected itself. But that involved looking at the code and knowing the code is poor. If you don't look at the code how would you know to state this requirement? Somehow you have to assess the level of intelligence you are dealing with.

      2 replies →

That's assuming no human would ever go near the code, and that over time it's not getting out of hand (inference time, token limits are all a thing), and that anti-patterns don't get to where the code is a logical mess which produces bugs through a webbing of specific behaviors instead of proper architecture.

However I guess that at least some of that can be mitigated by distilling out a system description and then running agents again to refactor the entire thing.

  • > However I guess that at least some of that can be mitigated by distilling out a system description and then running agents again to refactor the entire thing.

    The problem with this is that the code is the spec. There are 1000 times more decisions made in the implementation details than are ever going to be recorded in a test suite or a spec.

    The only way for that to work differently is if the spec is as complex as the code and at that level what’s the point.

    With what you’re describing, every time you regenerate the whole thing you’re going to get different behavior, which is just madness.

    • You could argue that all the way down to machine code, but clearly at some point and in many cases, the abstraction in a language like Python and a heap of libraries is descriptive enough for you not to care what’s underneath.

      1 reply →

  • And that is the right assumption. Why would any humans need (or even want) to look at code any more? That’s like saying you want to go manually inspect the oil refinery every time you fill your car up with gas. Absurd.

    • Cars may be built by robots but they are maintained by human technicians. They need a reasonable layout and a service manual. I can’t fathom (yet) having an important codebase - a significant piece of a company’s IP - that is shut off to engineers for auditing and maintenance.

Tests don't cover everything. Performance? Edge cases? Optimization of resource usage are not tipically covered by tests.

  • Humans not caring about performance is so common we have Wirth's law

    But now the clankers are coming for our jobs suddenly we're optimization specialists

    • It’s not about optimizing for performance, it’s about non-deterministic performance between “compiler” runs.

      The ideal that spec driven developers are pushing towards is that you’d check in the spec not the code. Anytime you need the code you’d just regenerate it. The problem is different models, different runs of the same model, and slightly different specs will produce radically different code.

      It’s one thing when your program is slow, it’s something completely different when your program performance varies wildly between deployments.

      This problem isn’t limited to performance, it’s every implicit implementation detail not captured in the spec. And it’s impossible to capture every implementation detail in the spec without the spec being as complex as the code.

      2 replies →