Comment by zmmmmm
6 hours ago
My experience with using AI tools for code review is that they do find critical bugs (from my retrospective analysis, maybe 80% of the time), but the signal to noise ratio is poor. It's really hard to get it not to tell you 20 highly speculative reasons why the code is problematic along with the one critical error. And in almost all cases, sufficient human attention would also have identified the critical bug - so human attention is the primary bottleneck here. Thus poor signal to noise ratio isn't a side issue, it's one of the core issues.
As a result, I'm mostly using this selectively so far, and I wouldn't want it turned on by default for every PR.
> but the signal to noise ratio is poor
Nail on the head. Every time I've seen it applied, its awful at this. However this is the one thing I loathe in human reviews as well, where people are leaving twenty comments about naming and then the actual FUNCTIONAL issue is just inside all of that mess. A good code reviewer knows how to just drop all the things that irk them and hyperfocus on what matters, if there's a functional issue with the code.
I wonder if AI is ever gonna be able to conquer that one as its quite nuanced. If they do though, then I feel the industry as it is today, is kinda toast for a lot of developers, because outside of agency, this is the one thing we were sorta holding out on being not very automatable.
Naming comments can be very useful in code that gets read by a lot of people. It can make the process of understanding the code much quicker.
On the other hand, if it's less important code or the renaming is not clearly an improvement it can be quite useless. But I've met some developers who has the opinion of reviews as pointless and just say "this works, just approve it already" which can be very frustrating when it's a codebase with a lot of collaboration.
Naming comments are useful when someone catches something like:
1. you are violating a previously agreed upon standard for naming things
2. inconsistent naming, eg some places you use "catalog ID" and other places you use "item ID" (using separate words and spaces here because case is irrelevant).
3. the name you chose makes it easy to conflate two or more concepts in your system
4. the name you chose calls into question whether you correctly understood the problem domain you are addressing
I'm sure there are other good naming comments, but this is a reasonable representation of the kinds of things a good comment will address.
However, most naming comments are just bike shedding.
1 reply →
> Naming comments can be very useful in code that gets read by a lot of people. It can make the process of understanding the code much quicker.
yes but it can be severely diminishing returns. Like lets step back a second and ask ourselves if:
var itemCount = items.Count;
vs
var numberOfItems = items.Count;
is ever worth spending the time discussing, versus how much of a soft improvement it makes to the code base. I've literally been in a meeting room with three other senior engineers killing 30 minutes discussing this and I just think that's a complete waste of time. They're not wrong, the latter is clearer, but if you have a PR that improves the repo and you're holding it back because of something like this, then I don't think you have your priorities straight.
5 replies →
A lot of these comments are not pointing out actual issues, just "That's not how I would have done it" type comments.
Yeah or worse like my boss. We don't have a style guide. But he always wants style changes in every PR, and those style changes are some times contradictory across different PRs.
Eventually I've told him "if your comment does not affect performance or business logic, I'm ignoring it". He finally got the message. The fact that he accepted this tells me that deep down he knew his comments were just bike shedding.
You should have a style guide, or adopt one. Having uniform code is incredibly valuable as it greatly reduces the cognitive load of reading it. Same reason that Go's verbose "err != nil" works so well.
I've been in teams like this - people who are lower on the chain of power get run in circles as they change to appease one, then change to appease another then change to go back to appease the first again.
Then, going through their code, they make excuses about their code not meeting the same standards they demand.
As the other responder recommends, a style guide is ideal, you can even create an unofficial one and point to it when conflicting style requests are made
Human comments tend to be short and sweet like "nit: rename creatorOfWidgets to widgetFactory". Whereas AI code review comments are long winded not as precise. So even if there are 20 humans comments, I can easily see which are important and which aren't.
My coworker is so far on this spectrum it's a problem. He writes sentences with half the words missing making it actually difficult to understand what he is trying to suggest.
All of the non critical words in english aren't useless bloat, they remove ambiguity and act as a kind of error correction if something is wrong.
it "nit" short for nitpick? I think prefixing PR comments with prefixes like that is very helpful for dealing with this problem.
1 reply →
That's not even mentioning a not insignificant part of the point of code reviews is to propagate understanding of the evolution of the code base among other team members. The reviewer benefits from the act of reviewing as well.
I've been using it a bit lately and at first I was enjoying it, but then it quickly devolved into finding more different minor issues with each minor iteration, including a lovely loop of check against null rather than undefined, check against undefined rather than null etc.
It very much depends on the product. In my experience, Copilot has terrible signal noise. But Bugbot is incredible. Very little noise and it consistently finds things the very experienced humans on my team didn’t.
I agree but find it's fairly easy noise to ignore.
I wouldn't replace human review with LLM-review but it is a good complement that can be run less frequently than human review.
Maybe that's why I find it easy to ignore the noise, I have it to a huge review task after a lot of changes have happened. It'll find 10 or so things, and the top 3 or 4 are likely good ones to look deeper into.
You should try Codex. There's a pretty wide gap between the quality of code review tools out there.
For the signal to noise reason, I start with Claude Code reviewing a PR. Then I selectively choose what I want to bubble up to the actual review. Often times, there's additional context not available to the model or it's just nit picky.
Agreed.
I have to constantly push back against it proposing C++ library code, like std::variant, when C-style basics are working great.
I absolutely hate the verbosity of AI. I know that you can give it context; I have done it, and it helps a little. It will still give me 10 "ideas", many of which are closely related to each other.