← Back to context

Comment by bArray

4 days ago

LLMs for code review, rather than code writing/design could be the killer feature. I think that code review has been broken for a while now, but this could be a way forward. Of particular interest would be security, undefined behaviour, basic misuse of features, double checking warnings out of the compiler against the source code to ensure it isn't something more serious, etc.

My current use of LLMs is typically via the search engine when trying to get information about an error. It has maybe a 50% hit rate, which is okay because I'm typically asking about an edge case.

ChatGPT is great for debugging common issues that have been written about extensively on the web (before the training cutoff). It's a synthesizer of Stack Overflow and greatly cuts down on the time it takes to figure out what's going on compared with searching for discussions and reading them individually.

(This IP rightly belongs to the Stack Overflow contributors and is licensed to Stack Overflow. It ought to be those parties who are exploiting it. I have mixed feelings about participating as a user.)

However, the LLM output is also noisy because of hallucinations — just less noisy than web searching.

I imagine that an LLM could assess a codebase and find common mistakes, problematic function/API invocations, etc. However, there would also be a lot of false positives. Are people using LLMs that way?

If you do "please review this code" in a loop, you'll eventually find a case where the chatbot starts by changing X to Y, and a bit later changes Y back to X.

It works for code review, but you have to be judicious about which changes you accept and which you reject. If you know enough to know an improvement when you see one, it's pretty great at spitting out candidate changes which you can then accept or reject.

Why isn't this spoken more about? Not a developer but work very closely with many - they are all on a spectrum from zero interest in this technology to actively using it to write code (correlates inversely seniority from my sample set) - very little talk on using it for reviews/checks - perhaps that needs to be done passively on commit.

  • The main issue with LLMs is that they can't "judge" contributions correctly. Their review is very nitpicky on things that don't matter and often misses big issues that a human familiar with the codebase would recognise. It's almost just noise at the end.

    That's why everyone is moving to the agent thing. Even if the LLM makes a bunch of mistakes, you still have a human doing the decision making and get some determinism.

  • So far, it seems pretty bad at code review. You'd get more mileage by configuring a linter.

Yeah Claude Code (Opus 4) is really marvelous at code review. I just give it the PR link and it does the rest (via gh cli). — it gives a md file that usually has some gems in it. Definitely improving the quantity of “my” feedback, and certainly reducing time required.

> LLMs for code review, rather than code writing/design could be the killer feature

This is already available on GitHub using Copilot as a reviewer. It's not the best suggestions, but usable enough to continue having in the loop.