Comment by jedberg

8 hours ago

> At this point, I either jump back to new design/plan files, or dive into the debug flow. Similar to the plan prompting, debug is instructed to review the current implementation, and outline N-M hypotheses for what could be wrong.

I'm biased because my company makes a durable execution library, but I'm super excited about the debug workflow we recently enabled when we launched both a skill and MCP server.

You can use the skill to tell your agent to build with durable execution (and it does a pretty great job the first time in most cases) and then you can use the MCP server to say things like "look at the failed workflows and find the bug". And since it has actual checkpoints from production runs, it can zero in on the bug a lot quicker.

We just dropped a blog post about it: https://www.dbos.dev/blog/mcp-agent-for-durable-workflows

4 comments

jedberg

zknill 7 hours ago

Why an MCP? dbos already ships a cli that appears to have the same features. Why an MCP over a skill that gives context on using the cli?

https://docs.dbos.dev/python/reference/cli

jumploops 7 hours ago

> we launched both a skill and MCP server.
My guess is that the MCP was easy enough to add, and some tools only support MCP.
Personal opinion: MCP is just codified context pollution.

jumploops 8 hours ago

This is great, giving agents access to logs (dev or prod) tightens the debug flow substantially.

With that said, I often find myself leaning on the debug flow for non-errors e.g. UI/UX regressions that the models are still bad at visualizing.

As an example, I added a "SlopGoo" component to a side project, which uses an animated SVG to produce a "goo" like effect. Ended up going through 8 debug docs[0] until I was satisified.

[0]https://github.com/jumploops/slop.haus/tree/main/debug

nubinetwork 6 hours ago

> giving agents access to logs (dev or prod) tightens the debug flow substantially.
Unless the agent doesn't know what it's doing... I've caught Gemini stuck in an edit-debug loop making the same 3-4 mistakes over and over again for like an hour, only to take the code over to Claude and get the correct result in 2-3 cycles (like 5-10 minutes)... I can't really blame Gemini for that too much though, what I have it working on isn't documented very well, which is why I wanted the help in the first place...