Comment by falcor84

17 hours ago

> so the code doesn't devolve into slop

As I see it, this is pretty much a given across all codebases, with a natural tendency of all projects to become balls of mud if the developer(s) don't actively strive to address technical debt and continuously refactor to address the new needs. But having said that, my experience is that for a given task in an unfamiliar codebase, an AI agent is already better at maintaining consistency than a typical junior developer, or even a mid-level developer who recently joined the team. And when explicitly given the task of refactoring the codebase while keeping the tests passing, the AI agents are already very capable.

The biggest issue, which is what you may be alluding to, is that AI agents are currently very bad at recognizing the limits of their capabilities and continue trying an approach when a human dev would have long since given up and went to their lead to ask for help or for the task specification to be redefined. That's definitely an issue, but I don't see any fundamental technological limitation here, but rather something addressable via an engineering effort.

In general, I've seen so many benchmarks fall to AI in the recent decade (including SWE-BENCH), that now I'm quite confident that if a task being performed by humans can be defined with clear numerical goals, then it's achievable by AI.

And another way I'm looking at it is that for any specific knowledge work competency, it seems to already be much easier and time effective to train an AI to do well on it than to create a curriculum for humans to learn it and then to have every single human to go through it.