Comment by wood_spirit

3 days ago

I have my own llm wrapping harness, which does this and has a few more tricks. For example, it doesn’t have a lot of mcp but it does have search_mcp and load_mcp tools (and search_skills) so the llm can find what it needs when it needs it without bloating the normal baseline context. The LLMs have proved really good at using them. There is also a waypoint tool they can use to record their thinking in the context without it being the final output. Am thinking about a search_expert to find colleagues it can bring into conversations too. And a lot of other stuff.

Pro tip they worked well for me with response truncation: in the truncated output, say that the full text is available in /tmp/whereever.txt - that way, the llm will be able to query and read more using built in tools without reissuing the big tool call.

great approach. I did that with my opencode based setup as well, it's neat and fun to tune skills and mcp loaders and stuff. Then i got fed up with opencode's design limitations. And then, my own harness work is on hold in favor of a harness-puppeteer paradigm, but that one has also been on hold! I'm mostly currently pulling on the thread of making it easier just to review the voluminous conversation turns!