Comment by tim-projects

19 hours ago

This is exactly the problem I've been working on and I see others are too. When you implement quality control gates, everything works better. It solves so many of the basic problems llms create - saying code is finished when it isn't. Skipping tests, introducing code regressions, basic code validation etc

I am finding that the better the quality gates are the lower quality llm you can use for the same result (at a cost of time).

Exactly! I don’t babysit TDD anymore. I have another agent that does that for me and honestly sometimes catches things I would have missed if I was the babysitting.

Hooks do wonders here. The payload contains a lot of information about the pending action the agent wants to make. Combine that with the most recent n events from the agent’s session history and you have a rich enough context to pass to another agent to validate the action through the SDK.

This way the validation uses the same subscription you’re logged in to, whether you’re using Claude Code, Codex, or Copilot. The validation agent responds with a json format that you can easily parse and return, allowing you to let the action through or block it with direction and guidance. I’m genuinely impressed by how well this works considering how simple it is.

You can find my approach here: https://github.com/nizos/probity