Comment by jumploops

12 hours ago

I do something similar, but across three doc types: design, plan, and debug

Design works similar to your project.md file, but on a per feature request. I also explicitly ask it to outline open questions/unknowns.

Once the design doc (i.e. design/[feature].md) has been sufficiently iterated on, we move to the plan doc(s).

The plan docs are structured like `plan/[feature]/phase-N-[description].md`

From here, the agent iterates until the plan is "done" only stopping if it encounters some build/install/run limitation.

At this point, I either jump back to new design/plan files, or dive into the debug flow. Similar to the plan prompting, debug is instructed to review the current implementation, and outline N-M hypotheses for what could be wrong.

We review these hypotheses, sometimes iterate, and then tackle them one by one.

An important note for debug flows, similar to manual debugging, it's often better to have the agent instrument logging/traces/etc. to confirm a hypothesis, before moving directly to a fix.

Using this method has led to a 100% vibe-coded success rate both on greenfield and legacy projects.

Note: my main complaint is the sheer number of markdown files over time, but I haven't gotten around to (or needed to) automate this yet, as sometimes these historic planning/debug files are useful for future changes.

9 comments

jumploops

miki123211 10 hours ago

My "heavy" workflow for large changes is basically as follows:

0. create a .gitignored directory where agents can keep docs. Every project deserves one of these, not just for LLMs, but also for logs, random JSON responses you captured to a file etc.

1. Ask the agent to create a file for the change, rephrase the prompt in its own words. My prompts are super sloppy, full of typos, with 0 emphasis put on good grammar, so it's a good first step to make sure the agent understands what I want it to do. It also helps preserve the prompt across sessions.

2. Ask the agent to do research on the relevant subsystems and dump it to the change doc. This is to confirm that the agent correctly understands what the code is doing and isn't missing any assumptions. If something goes wrong here, it's a good opportunity to refactor or add comments to make future mistakes less likely.

3. Spec out behavior (UI, CLI etc). The agent is allowed to ask for decisions here.

4. Given the functional spec, figure out the technical architecture, same workflow as above.

5. High-level plan.

6. Detailed plan for the first incomplete high-level step.

7. Implement, manually review code until satisfied.

8. Go to 6.

jedberg 12 hours ago

> At this point, I either jump back to new design/plan files, or dive into the debug flow. Similar to the plan prompting, debug is instructed to review the current implementation, and outline N-M hypotheses for what could be wrong.

I'm biased because my company makes a durable execution library, but I'm super excited about the debug workflow we recently enabled when we launched both a skill and MCP server.

You can use the skill to tell your agent to build with durable execution (and it does a pretty great job the first time in most cases) and then you can use the MCP server to say things like "look at the failed workflows and find the bug". And since it has actual checkpoints from production runs, it can zero in on the bug a lot quicker.

We just dropped a blog post about it: https://www.dbos.dev/blog/mcp-agent-for-durable-workflows

zknill 11 hours ago
Why an MCP? dbos already ships a cli that appears to have the same features. Why an MCP over a skill that gives context on using the cli?
https://docs.dbos.dev/python/reference/cli
- jumploops 11 hours ago
  
  > we launched both a skill and MCP server.
  My guess is that the MCP was easy enough to add, and some tools only support MCP.
  Personal opinion: MCP is just codified context pollution.
jumploops 12 hours ago
This is great, giving agents access to logs (dev or prod) tightens the debug flow substantially.
With that said, I often find myself leaning on the debug flow for non-errors e.g. UI/UX regressions that the models are still bad at visualizing.
As an example, I added a "SlopGoo" component to a side project, which uses an animated SVG to produce a "goo" like effect. Ended up going through 8 debug docs[0] until I was satisified.
[0]https://github.com/jumploops/slop.haus/tree/main/debug
- nubinetwork 10 hours ago
  
  > giving agents access to logs (dev or prod) tightens the debug flow substantially.
  Unless the agent doesn't know what it's doing... I've caught Gemini stuck in an edit-debug loop making the same 3-4 mistakes over and over again for like an hour, only to take the code over to Claude and get the correct result in 2-3 cycles (like 5-10 minutes)... I can't really blame Gemini for that too much though, what I have it working on isn't documented very well, which is why I wanted the help in the first place...

frumiousirc 9 hours ago

> Note: my main complaint is the sheer number of markdown files over time, but I haven't gotten around to (or needed to) automate this yet, as sometimes these historic planning/debug files are useful for future changes.

FWIW, what you describe maps well to Beads. Your directory structure becomes dependencies between issues, and/or parent/children issue relationship and/or labels ("epic", "feature", "bug", etc). Your markdown moves from files to issue entries hidden away in a JSONL file with local DB as cache.

Your current file-system "UI" vs Beads command line UI is obviously a big difference.

Beads provides a kind of conceptual bottleneck which I think helps when using with LLMs. Beads more self-documenting while a file-system can be "anything".

wek 6 hours ago

Similar, but we have the agent write the test cases after writing the plan and then iterate until it passes the test cases.

danenania 4 hours ago

I have a similar process and have thought about committing all the planning files, but I've found that they tend to end up in an outdated state by the time the implementation is done.

Better imo is to produce a README or dev-facing doc at the end that distills all the planning and implementation into a final authoritative overview. This is easier for both humans and agents to digest than bunch of meandering planning files.