← Back to context

Comment by jonnyasmar

5 hours ago

Building on the "investigation > patch" point — running Claude Code, Codex, and Gemini CLI daily, the pattern I keep noticing is that auto-fix is fine on "obvious bug, obvious fix" (off-by-one, null check, missing await, error not propagated). It falls over on "subtle invariant" bugs where the existing code is intentionally weird to preserve something non-obvious — the PR looks right and breaks something three modules away.

The tool I'd actually want isn't "tries harder to fix everything." It's one that credibly says "this touches an invariant I can't see — here's what I think might happen, you handle it." Calibrated humility beats confident patches.

Curious how your high-confidence threshold actually works. Self-reported model certainty (notoriously unreliable), test coverage in the affected area, blast-radius of the change, something else?