Comment by sothatsit
14 days ago
Each agent having their own fresh context window for each task is probably alone a good way to improve quality. And then I can imagine agents reviewing each others work might work to improve quality as well, like how GPT-5 Pro improves upon GPT-5 Thinking.
There's no need to anthropomorphize though. One loop that maintains some state and various context trees gets you all that in a more controlled fashion, and you can do things like cache KV caches across sessions, roll back a session globally, use different models for different tasks, etc. Assuming a one-to-one-to-one relationship between loops and LLM and context sounds cooler--distributed independent agents--but ultimately that approach just limits what you can do and makes coordination a lot harder, for very little realizable gain.
The solutions you suggest are multiple agents. An agent is nothing more than a linear context and a system that calls tools in a loop while appending to that context. Whether you run them in a single thread where you fork the context and hotswap between the branches, or multiple threads where each thread keeps track of its own context, you are running multiple agents either way.
Fundamentally, forking your context, or rolling back your context, or whatever else you want to do to your context also has coordination costs. The models still have to decide when to take those actions unless you are doing it manually, in which case you haven't really solved the context problems, you've just given them to the human in the loop.
I guess there needs to be a definition of "agent". To my intuition, the "agent" approach means multiple independent AI automata working in parallel and communicating via some async channels, each managing only its own context, each "always on", always doing something. The orchestrator is its own automaton and assigns agents to tasks, communicating through the same channels, mimicking the behavior and workflow of an engineering team composed of multiple independent people.
I see this as being different from a single process loop that directly manages the contexts, models, system prompts, etc. I get that it's not that different; kind of like FP vs OOP you can do the same thing in either. But I think the end result is simpler if we just think about it as a single loop that manages contexts directly to complete a project, rather than building an async communication and coordination system.
1 reply →