← Back to context

Comment by vidarh

5 days ago

I've come to the opposite conclusions: The big limitation of systems like this is starting and ending with human involvement at the same level, instead of directing at a higher level. You end up quibbling over detail the agents can handle themselves with sufficient guardrails and process, instead of setting higher level requirements and reviewing higher level decisions and outcomes, and dealing with exceptions.

You can afford a lot of extra guardrails and process to ensure sufficient quality when the result is a system that gets improved autonomously 24/7.

I'm on my way home from a client, and meanwhile another project has spent the last 10 hours improving with no involvement from me. I spent a few minutes reviewing things this morning, after it's spent the whole night improving unattended.

I don't believe comments like this. Sure it did work for ten hours but if you didn't review it you will sooner or later when it breaks. And it will. I run the agents all day and that's what happens - they do stuff that is unwanted but that you aren't aware of.

I find that that doesn’t work in the long run. Software agents are not yet capable of maintaining a decently active repository for extended periods of time.

I am all for delegating everything to AI agents, but it just becomes a mess over time if you don’t steer things often enough.

  • Not my experience at all. If anything, they make it cheap enough to deal with tech debt that it is far easier to justify being strict.

    EDIT: I'll add that you can't expect it to guess what you want, but you can let it manage how it delivers it. We don't expect e.g. a product manager to dictate how developers deliver the code, just what the acceptance criteria is, and that's where I'm headed.