Comment by malexw
13 hours ago
I think Martin Fowler's "Refactoring" might give a bit of insight here. One of my take-aways after reading that book is that the specific implementation of a function is not very important if you have tests. He argues that it can sometimes be easier to completely re-write a function than to take the time to understand it - as long as you can validate that your re-write performs the same way. This mindset lines up pretty closely with how I've been using LLMs.
If that's true, then I would think the emphasis in code review should be more on test quality and verifying that the spec is captured accurately, and as you suggest, the actual implementation is less important.
This is why I've been pushing back on the "just have the AI generate the tests!" mentality. Sure, let it help you, but those tests are the guarantee of quality and fit for purpose. If you vibe code them, how the hell do you know if it even does what you think it does?
You should be planning out the tests to properly exercise the spec, and ensuring those tests actually do what the spec requires. AI can suggest more tests (but be careful here, too, because a ballooned test suite slows down CICD), but it should never be in charge of them completely.
A related book I've been thinking about in terms of LLMs is "Working Effectively With Legacy Code". I'd love to be able to work a lot of that advice into some kind of Skill or customized agent to help with big refactors.
Oh gosh - now that you mention it, it was "Working Effectively with Legacy Code" that I was thinking of, not "Refactoring".
That's my experience with agentic development so far, a lot of extra time goes into testing.
Problem is, the way I've been trained to test isn't exactly antagonistic. QA does that kind of thing. Programmers writing tests are generally rather doing spot checks that only make sense if the code is generally understood and trustworthy. Code LLMs produce is usually broken in subtle, hard to spot ways.
Counter-point, developers that get used to not caring about function implementation, are going to culturally also not care as much about test implementation, making this proposed ideal impossible.
with LLMs, tests cost nearly nothing of effort but provide tremendous value.
> wow that's a lot of code, how will we ever review it?
>> have a model generate a bunch of tests instead
> wow that's a lot of test code, how will we know it's working correctly?
>> review it
> :face-with-rolling-eyes:
And you know those tests are correct how?
1 reply →