Comment by logicprog
18 hours ago
I think it's nonsensical to insist that it would only be a subjective improvement. The tests either exist and ensure that there aren't bugs in certain areas, or they don't. The agent is either in a feedback loop with those tests and continues to work until it has satisfied them or it doesn't.
That sounds like a very specific implementation strategy related to TDD
Red-Green TDD is one of the main "agent patterns" Simon proposes, so it seemed relevant.
Also, the same thing applies to feedback loops with compilers and linters as well: they provide objective feedback that then the AI goes and fixes, verifiably resolving the feedback.
Even with less verifiable things like using specifications, the fact that it relies on less objective grounding metrics doesn't mean there's no change in the model's behavior. I'm sure if you looked at the code that a model produced and the amount of intervention necessary to get there for a model that was asked to produce something without a specification versus with one, you would definitely see an objective difference on average. We're already getting objective studies regarding AGENTS.MD