← Back to context

Comment by cheema33

2 months ago

> I'm sure one could solve that more generally, by putting the agent writing the code in a loop with some other code reviewing agent.

This x 100. I get so much better quality code if I have LLMs review each other's code and apply corrections. It is ridiculously effective.

Can you elaborate a little more on your setup? Are you manually copyong and pasting code from one LLM to another, or do you have some automated workflow for this?

  • No manual copy paste. That is not good use of time. I work in a git repo and point multiple LLMs at it.

    One LLM reviews existing code and the new requirement and then creates a PRD. I usually use Augment Code for this because it has a good index of all local code.

    I then ask Google Gemini to review the PRD and validate it and find ways to improve it. I then ask Gemini to create a comprehensive implementation plan. It frequently creates a 13 step plan. It would usually take me a month to do this work.

    I then start a new session of Augment Code, feed it the PRD and one of the 13 tasks at a time. Whatever work it does, it checks it in a feature branch with detailed git commit comment. I then ask Gemini to review the output of each task and provide feedback. It frequently finds issues with implementation or areas of improvement.

    All of this managed by using git. I make LLMs use git. I think would go insane if I had to copy/paste this much stuff.

    I have a recipe of prompts that I copy/paste. I am trying to find ways to cut that down and making slow progress in this regard. There are tools like "Task Master" (https://github.com/eyaltoledano/claude-task-master) that do a good job of automating this workflow. However this tool doesn't allow much customization. e.g. Have LLMs review each other's work.

    But, maybe I can get LLMs to customize that part for me...

  • I have been doing this with claude code and openai codex and/or cline. One of the three takes the first pass (usually claude code, sometimes codex), then I will have cline / gemini 2.5 do a "code review" and offer suggestions for fixes before it applies them.