Comment by PradeetPatel
4 days ago
The proposed industry solution is to use agents to review PRs, as not to slow down the velocity of delivery...
My current workplace is going through a major "realignment" exercise to replace as many testers with agents as humanely possible, which proved to be a challenge when the existing process is not well documented.
The fact that anyone in leadership would ever think this is even remotely possible - given my experience in the general state of requirements / contracts / integrations / support - makes me bleed from my earholes just a little bit.
It's starting to just feel a little like an excuse to call everyone on deck for "a few weeks trying 9-9-6". But even then the lack of traction isn't between the eyeballs and the deployment. You'll still be spinning wheels in that slippery stuff between what a customer is thinking and what the iron they bought is doing.
So you essentially trust the output of the model from beginning to end? Curious to know what type of application you're building where you can safely do that.
Edit: to clarify, I know these models have gotten significantly better. The output is pretty incredible sometimes, but trusting it end to end like that just seems super risky still.
I guarantee you it's nothing quantifiable.
LLMs can't be responsible for deciding what code you use because they have no skin in the game. They don't even have skin.
If you type fast, well then it takes just as long to code it yourself as review it. Plus you actually get flow time when you're coding.
For heaven's sake people have the robot write your unit tests and dashboards, not your production code. Otherwise delete yourself.
"Hey Claude, did Claude do a good job?"
I did an experiment today, where I had a new Claude agent review the work of a former Claude agent - both Opus 4.6 - on a large refactor on a 16k LOC project. I had it address all issues it found, then I cleared context, and repeated. Rinse and repeat. It took 4 iterations before it approached nitpicking. The fact that each agent found new, legitimate problems that the last one had missed was concerning to me. Why can’t it find all of them at once?
You're expecting it to be a person. It's not.
It is more like a wiggly search engine. You give it a (wiggly) query and a (wiggly) corpus, and it returns a (wiggly) output.
If you are looking for a wiggly sort of thing 'MAKE Y WITH NO BUGS' or 'THE BUGS IN Y', it can be kinda useful. But thinking of it as a person because it vaguely communicates like a person will get you into problems because it's not.
You can try to paper over it with some agent harness or whatever, but you are really making a slightly more complex wiggly query that handles some of the deficiency space of the more basic wiggly query: "MAKE Y WITH NO ISSUES -> FIND ISSUES -> FIX ISSUE Z IN Y -> ...".
OK well what is an issue? _You_ are a person (presumably) and can judge whether something is a bug or a nitpick or _something you care about_ or not. Ultimately, this is the grounding that the LLM lacks and you do not. You have an idea about what you care about. What you care about has to be part of the wiggly query, or the wiggly search engine will not return the wiggly output you are looking for.
You cannot phrase a wiggly query referencing unavailable information (well, you can, but it's pointless). The following query is not possible to phrase in a way an LLM can satisfy (and this is the exact answer to your question):
- "Make what I want."
What you want is too complicated, and too hard, and too unknown. Getting what you are looking for reduces to: query for an approximation of what I want, repeating until I decide it no longer surfaces what I want. This depends on an accurate conception of what you want, so only you can do it.
If you remove yourself from the critical path, the output will not be what you want. Expressing what you want precisely enough to ground a wiggly search would just be something like code, and obviates the need for wiggly searching in the first place.
1 reply →