Comment by daemonologist

3 days ago

Opus 4.7 and 4.8 are also rather "proactive" - several times I've seen them try to inspect compiled binaries before there's even a problem, just to check that their changes are included (and if I let them do so they often get stuck down that rabbithole).

I've also seen this. It'll run 'strings' against the binary and then convince itself that the Makefile isn't working right, and there's some imaginary sandbox preventing the code from compiling properly. So it will compile it by hand, and never run strings against the new binary, and proceed happily.

These kinds of situations are why I gave my AI agents stray thoughts (automated insights / suggestions from a separate llm call with some curated context) that trigger on loop / rabbit hole detection.

Quite a bit of false positives, but it hasn’t had any ill-effect so far. Aside from increased quota usage.