Comment by webprofusion
10 days ago
The tone of the article is that getting AI agents to do anything fundamentally wrong because they'll make mistakes and its expensive to run them.
So:
- Humans make mistakes all the time and we happily pay for those by the hour as long as the mistakes stay within an acceptable threshold.
- Models/agents will get cheaper as diminishing returns in quality of results get more common. Hardware to run them will get cheaper and less power hungry as it increases in commodity.
- In all cases, It Depends.
If I ask a human tester to test the UI and API of my app (which will take them hours) the documented tests and expected results are the same as if I asked an AI to do it, the cost may be the same or less of an AI to do it but I can ask the AI to do it again for every change, or every week etc. Have genuinely started to test this way.
It depends what you mean by agent, first of all, but I’m going to assume you mean what I’ve called “narrow agency” here[0]: “[an LLM] that can _plan and execute_ tasks that happen outside the chat window“.
That humans make mistakes all the time is the reason we encode business logic in code and automate systems. An “if” statement is always going to be faster, more reliable, and have better observability than a human or LLM-based reasoning agent.
0: https://sgnt.ai/p/agentic-ai-bad-definitions/
> Humans make mistakes all the time and we happily pay for those by the hour as long as the mistakes stay within an acceptable threshold.
We don't, however, continue to pay for the same person who keeps making the same mistakes and doesn't learn from them. Which is what happens with LLMs.
This is why easy "out of the box" continual learning is absolutely essential in practice. It's not like the LLM is incapable of solving tasks, it simply wasn't trained for your specific one. There are optimizers like DSPy that let you validate against a test dataset to increase reliability at the expense of generality.