← Back to context

Comment by flo_r

1 day ago

Gh issues works surprisingly well as an agent board. Labels for state, one issue per feature. The part i haven't figured out yet is how to know when the output is actually done vs just "looks done" to the agent.

Been thinking about this. "Done vs looks done" is partially who is accountable for calling it done, and the trap is the agent that did the work also often declaring it done.

Cheapest fix: a separate done-caller (another agent or you) against criteria written before the work. Reviewer is never the author. (Basically RACI, responsible != accountable)

I find well described but concise acceptance criteria does a good job of anchoring the llm to the correct output. Also have them take screenshots of any UI work and respond to the ticket with them as proof.