Comment by cyanydeez

2 hours ago

so opencode has multiple agents for the primary, and that's all I'm doing _but_, I pair it with llamacpp for thinking mode: --reasoning-budget N token budget for thinking: -1 for unrestricted, 0 for immediate end,

N>0 for token budget (default: -1) (env: LLAMA_ARG_THINK_BUDGET)

--reasoning-budget-message MESSAGE message injected before the end-of-thinking tag when reasoning budget

Currently, opencode doesn't do this, but the budget-message could be implemented by a harness to put in custom message. So I tailored an agent with a message that has it either compress the context via dynamic compression plugin or request that it uses a subagent to avoid bloating the context.

It's mildly successful, but you can tell as context size grows it becomes more and more narrow sited or wayward.