Comment by manmal
3 days ago
As long as you give it deterministic goals / test criteria (compiles, lints, tests, E2E tests, achieve 100% parity with existing solution etc) it will brute force its way to a solution. Codex will work for hours/days, even weeks sometimes, until it has finished. A person would never work this way, but since this just runs in the background, there’s no issue with this approach except if you need it fast.
No, it might figure out the solution but even after many days there's no assurance that it won't get stuck making the same mistakes over and over again, never getting closer to a solution. I've seen this many times.
Getting in a loop does still happen, yes. If you run codex in tmux and let another agent just occasionally check on progress, it can be prevented. That’s not even expensive - checking every 30 minutes suffices. The watchdog agent can then press Esc in tmux and send a message, maybe do some research to get it unstuck etc
Definitely have not seen that with Opus 4.5.
Neither have I, personally, but I’ve seen reports this can happen on very hard problems, where the goal just cannot be reached from a local optimum. Getting unstuck by trying something new is something a watchdog agent could prompt it.