Comment by slipheen

2 hours ago

I read the GitHub repo, but still don't quite understand-

What exactly is the advantage of doing this vs just running a prompt in my existing coding agent?

I don't understand why this is a harness/project vs just for example, a skill?

I'm confident there's a good reason, I just don't understand.

1 comment

slipheen

avyvar 1 hour ago

Totally fair question. If you only want one agent to sanity-check one doc change, a skill/prompt is probably enough.

We actually aren’t rebuilding a harness here, it’s Pi with several LLM options to select from. The reason this is a project is that the useful workflow is more like a docs test suite: run realistic user tasks across multiple models, isolate each run in a greenfield sandbox, keep the transcripts/results, and make failures reproducible in CI.

You could ask an existing coding agent to spawn subagents for every task/model pair, but once that matrix grows, running hundreds of subagents on your computer gets messy. It’s also the wrong isolation boundary: for docs testing, you usually want the agent to start from a clean environment with access only to the docs/product surface you’re testing, not your whole working tree or local setup.