Comment by usrbinbash

1 month ago

> My answer to this is to often get the LLMs to do multiple rounds of code review

So I am supposed to trust the machine, that I know I cannot trust to write the initial code correctly, to somehow do the review correctly? Possibly multiple times? Without making NEW mistakes in the review process?

Sorry no sorry, but that sounds like trying to clean a dirty floor by rubbing more dirt over it.

13 comments

usrbinbash

atonse 1 month ago

It sounds to me like you may not have used a lot of these tools yet, because your response sounds like pushback around theoreticals.

Please try the tools (especially either Claude Code with Opus 4.5, or OpenAI Codex 5.2). Not at all saying they're perfect, but they are much better than you currently think they might be (judging by your statements).

AI code reviews are already quite good, and are only going to get better.

gixco 1 month ago
Why is the go-to always "you must not have used it" in lieu of the much more likely experience of having already seen and rejected first-hand the slop that it churns out? Synthetic benchmarks can rise all they want; Opus 4.5 is still completely useless at all but the most trivial F# code and, in more mainstream affairs, continues to choke even on basic ASP.NET Core configuration.
- atonse 1 month ago
  
  About a year ago they sucked at writing elixir code.
  Now I use them to write nearly 100% of my elixir code.
  My point isn’t a static “you haven’t tried them”. My point is, “try them every 2-3 months and watch the improvements, otherwise your info is outdated”
usrbinbash 1 month ago
> It sounds to me like you may not have used a lot of these tools yet
And this is more and more becoming the default answer I get whenever I point out obvious flaws of LLM coding tools.
Did it occur to you that I know these flaws precisely because I work a lot with, and evaluate the performance of, LLM based coding tools? Also, we're almost 4y into the alleged "AI Boom" now. It's pretty safe to assume that almost everyone in a development capacity has spent at least some effort evaluating how these tools do. At this point, stating "you're using it wrong" is like assuming that people in 2010 didn't know which way to hold a smartphone.
Sorry no sorry, but when every criticism towards a tool elecits the response that people are not using it well, then maybe, just maybe, the flaw is not with all those people, but with the tool itself.
- atonse 1 month ago
  
  Spending 4 years evaluating something that’s changing every month means almost nothing, sorry.
  Almost every post exalting these models’ capabilities talks about how good they’ve gotten since November 2025. That’s barely 90 days ago.
  So it’s not about “you’re doing it wrong”. It’s about “if you last tried it more than 3 months ago, your information is already outdated”
  
  1 reply →

pluralmonad 1 month ago

Implementation -> review cycles are very useful when iterating with CC. The point of the agent reviewer is not to take the place of your personal review, but to catch any low hanging fruit before you spend your valuable time reviewing.

usrbinbash 1 month ago
> but to catch any low hanging fruit before you spend your valuable time reviewing.
And that would be great, if it wern't for the fact that I also have to review the reviewers review. So even for the "low hanging fruit", I need to double-check everything it does.
Which kinda eliminates the time savings.
- pluralmonad 1 month ago
  
  That is not my perspective. I don't review every review, instead use a review agent with fresh context to find as much as possible. After all automated reviews pass, I then review the final output diff. It saves a lot of back and forth, especially with a tight prompt for the review agent. Give the reviewer specific things to check and you won't see nearly as much garbage in your review.

hombre_fatal 1 month ago

Well, you can review its reasoning. And you can passively learn enough about, say, Rust to know if it's making a good point or not.

Or you will be challenged to define your own epistemic standard: what would it take for you to know if someone is making a good point or not?

For things you don't understand enough to review as comfortably, you can look for converging lines of conclusions across multiple reviews and then evaluate the diff between them.

I've used Claude Code a lot to help translate English to Spanish as a hobby. Not being a native Spanish speaker myself, there are cases where I don't know the nuances between two different options that otherwise seem equivalent.

Maybe I'll ask 2-3 Claude Code to compare the difference between two options in context and pitch me a recommendation, and I can drill down into their claims infinitely.

At no point do I need to go "ok I'll blindly trust this answer".

ctoth 1 month ago

Wait until you start working with us imperfect humans!

Ronsenshi 1 month ago
Humans do have capacity for deductive reasoning and understanding, at least. Which helps. LLMs do not. So would you trust somebody who can reason or somebody who can guess?
- galangalalgol 1 month ago
  
  People work different than llms they fond things we don't and the reverse is also obviously true. As an example, a stavk ise after free was found in a large monolithic c++98 codebase at my megacorp. None of the static analyzers caught it, even after modernizing it and getting clang tidy modernize to pass, nothing found it. Asan would have found it if a unit test had covered that branch. As a human I found it but mostly because I knew there was a problem to find. An llm found and explained the bug succinctly. Having an llm be a reviewer for merge requests males a ton of sense.