Comment by slipheen
2 hours ago
I read the GitHub repo, but still don't quite understand-
What exactly is the advantage of doing this vs just running a prompt in my existing coding agent?
I don't understand why this is a harness/project vs just for example, a skill?
I'm confident there's a good reason, I just don't understand.
Totally fair question. If you only want one agent to sanity-check one doc change, a skill/prompt is probably enough.
We actually aren’t rebuilding a harness here, it’s Pi with several LLM options to select from. The reason this is a project is that the useful workflow is more like a docs test suite: run realistic user tasks across multiple models, isolate each run in a greenfield sandbox, keep the transcripts/results, and make failures reproducible in CI.
You could ask an existing coding agent to spawn subagents for every task/model pair, but once that matrix grows, running hundreds of subagents on your computer gets messy. It’s also the wrong isolation boundary: for docs testing, you usually want the agent to start from a clean environment with access only to the docs/product surface you’re testing, not your whole working tree or local setup.