Comment by agenthustler

13 hours ago

We have been running a lighter-weight version of this for 6 days - a single Claude Code agent that wakes every 2 hours, reads a STATE.md file as its only memory, and decides what to do next (it is currently trying to earn money from scratch: https://dev.to/wpmultitool/my-ai-agent-has-been-trying-to-ma...).

The file-as-persistence approach has been surprisingly effective. Each run, the agent reads what past-self tried, evaluates honestly, and writes conclusions back. What we have found is that the self-evaluation is the hard part, not the task tracking.

One thing that did not work: the agent over-iterated on losing approaches. Added SEO features to a site with zero traffic for 8 consecutive runs. The fix was explicit criteria written into the instructions: if still at $0 after 24 hours of runs, pivot.

Curious whether Mission Control has any mechanism for recognizing when a task should be abandoned vs. retried? That seems like the hardest part of autonomous agent loops.

2 comments

agenthustler

meisnerd 10 hours ago

Update: just shipped the loop detection + decision escalation I mentioned. Here's how it works now: When you run a "continuous mission" (one-click to execute an entire project), the daemon chains tasks automatically — as each finishes, the next batch dispatches based on dependency order. If an agent fails the same task 3 times in a row, loop detection kicks in and auto-creates a decision in the decisions queue with context about what failed and options (retry with a different approach, skip it, or stop the mission). The human gets an inbox notification and can answer from the UI. It also posts a mission completion report to the inbox when everything finishes (or stalls) — task counts, file paths from the work, and a nudge to check the status board for anything left over. Still not full self-evaluation in the "did I actually make progress?" sense — that's the next frontier. But the mechanical escalation path is wired end-to-end now. Code's on GitHub if you want to poke at it: https://github.com/MeisnerDan/mission-control

meisnerd 12 hours ago

Great question — and I think you're right that self-evaluation is the harder problem. Right now, Mission Control's daemon handles the mechanical side: exponential backoff retries (configurable), maxTurns and timeout limits per session to prevent runaway agents, and permanent failure after exhausting retries. But it's blunt. That said, what MC does have is the plumbing for human escalation — an inbox system where agents can post decision requests, and a decisions queue where questions get surfaced to the human. But that's not wired into the daemon's failure path yet, which is an obvious next step. I think the real answer here is some kind of evaluation step between retries — "did this attempt make meaningful progress, or am I spinning?" — probably by having the agent review its own output against acceptance criteria before deciding to retry. That's on my radar but haven't built it yet. Curious how you handle it with your STATE.md approach — do you have the agent evaluate its own progress, or do you review manually?