Comment by eru

2 months ago

That's why you have Codex review the code.

(I'm only half joking. Having one LLM review the PRs of another is actually useful as a first line filter.)

3 comments

eru

Even having Opus review code written by Opus works very well as a first pass. I typically have it run a sub-agent to review its own code using a separate prompt. The sub-agents gets fresh context, so it won't get "poisoned" by the top level contexts justifications for the questionable choices it might have made. The prompts then direct the top level instance to repeat the verification step until the sub-agent gives the code a "pass", and fix any issues flagged.

The result is change sets that still need review - and fixes - but are vastly cleaner than if you review the first output.

Doing runs with other models entirely is also good - they will often identify different issues - but you can get far with sub-agents and different persona (and you can, if you like, have Claude Code use a sub agent to run codex to prompt it for a review, or vice versa - a number of the CLI tools seems to have "standardized" on "-p <prompt>" to ask a question on the command line)

Basically, reviewing output from Claude (or Codex, or any model) that hasn't been through multiple automated review passes by a model first is a waste of time - it's like reviewing the first draft from a slightly sloppy and overly self-confident developer who hasn't bothered checking if their own work even compiles first.

eru 2 months ago
Thanks, that sounds all very reasonable!
> Basically, reviewing output from Claude (or Codex, or any model) that hasn't been through multiple automated review passes by a model first is a waste of time - it's like reviewing the first draft from a slightly sloppy and overly self-confident developer who hasn't bothered checking if their own work even compiles first.
Well, that's what the CI is for. :)
In any case, it seems like a good idea to also feed the output of compiler errors and warnings and the linter back to your coding agent.
- vidarh 2 months ago
  
  > Well, that's what the CI is for. :)
  Sure, but I'd prefer to catch it before that, not least because it's a simpler feedback loop to ensure Claude fixes its own messes.
  > In any case, it seems like a good idea to also feed the output of compiler errors and warnings and the linter back to your coding agent.
  Claude seems to "love" to use linters and error messages if it's given the chance and/or the project structure hints at an ecosystem where certain tools are usually available. But just e.g. listing by name a set of commands it can use to check things in CLAUDE.md will often be enough to have it run it aggressively.
  If not enough, you can use hooks to either force it, or sternly remind it after every file edit, or e.g. before it attempts to git commit.