Comment by krackers

10 hours ago

>I think there is something extra you get with a proposer-verifier style loop, even if both sides are using the same base model.

DeepSeekMath-V2 seems to show this, increasing the number of prover/verifier iterations gives increases accuracy. And this is with a model that has already undergone RL under a prover/verifier selection process.

However this type of subagent communication maintains full context, and is different from "breaking into tasks" style of sharding amongst subagents. I'm less convinced of the latter, because often times a problem is more complex than the sum of its parts, i.e. it's the interdependencies that make it complex and you need to consider each part in relation to the other parts, not in isolation.

The specific way in which we invoke the subagents is critical to the performance of the system. If we use a true external call stack and force proper depth first recursion, the effective context can be maintained to whatever depth is desired.

Parallelism and BFS style approaches do not exhibit this property. Anything that happens within the context or token stream is a much weaker solution. Most agent frameworks are interested in appearance of speed, so they miss out on the nuance of this execution model.