← Back to context

Comment by GodelNumbering

1 day ago

Thanks.

1. Context management - Don't bother with pruning unless your API doesn't support caching. Every prune breaks the cache and you lose the 90% discounted caching rate

2. I did some work improving Cline's subagent feature that Dirac inherited. In my experience, not all models are trained effectively to delegate work, so YMMV. A common pitfall to watch is, what happens if one or more subagents get stuck in a loop or for whatever reason don't return? You need a mechanism to control them from the main agent

It depends where you prune and how the specific prefix cache you're targeting works. Pruning or condensing recent items that are unnecessary probably pays for itself.

1. For me pruning is a bit less about cost than performance. Recent research suggests lower context size is nearly always better, and many harnesses implement a sliding window for tool output pruning. Also not every provider supports caching, and if they do it might have expired (especially on restored sessions).

2. That's a good hint, I'm currently only trying with tighter turn and token limits for subagents and an error summary on exceeding them. Not sure how else (besides steering and prompt engineering) to ensure the subagent doesn't go wild...