Comment by avyvar
1 hour ago
Totally fair question. If you only want one agent to sanity-check one doc change, a skill/prompt is probably enough.
We actually aren’t rebuilding a harness here, it’s Pi with several LLM options to select from. The reason this is a project is that the useful workflow is more like a docs test suite: run realistic user tasks across multiple models, isolate each run in a greenfield sandbox, keep the transcripts/results, and make failures reproducible in CI.
You could ask an existing coding agent to spawn subagents for every task/model pair, but once that matrix grows, running hundreds of subagents on your computer gets messy. It’s also the wrong isolation boundary: for docs testing, you usually want the agent to start from a clean environment with access only to the docs/product surface you’re testing, not your whole working tree or local setup.
No comments yet
Contribute on Hacker News ↗