← Back to context

Comment by ClintEhrlich

8 days ago

Hi NWU,

We don't have any other materials yet, but let's see if this lands for you. I can run you through a couple simpler versions of the system, why they don't work, and how that informs our ultimate design.

The most basic part of the system is "two layers". Layer 1 is the "ground truth" of the conversation - the whole text the user sees. Layer 2 is what the model sees, i.e., the active context window.

In a perfect world, those would be the same thing. But, as you know, context lengths aren't long enough for that, so we can't fit everything from Layer 1 into Layer 2.

So instead we keep a "pointer" to the appropriate part of Layer 1 in Layer 2. That pointer takes the form of a summary. But it's not a summary designed to contain all information. It's more like a "label" that makes sure the model knows where to look.

The naive version of the system would allow the main model to expand Layer 2 summaries by importing all of the underlying data from Layer 1. But this doesn't work well, because then you just end up re-filling the Layer 2 context window.

So instead you let the main model clone itself, the clone expands the summary in its context (and can do this for multiple summaries, transforming each into the original uncompressed text), and then the clone returns whatever information the main thread requires.

Where this system would not fully match the capabilities of RLMs is that, by writing a script that calls itself e.g. thousands of times, an RLM has the ability to make many more recursive tool calls than can fit in a context window. So we fix that using operator-level recursion, i.e., we give the LLM a tool, map, that executes arbitrary recursion, without the LLM having to write a custom script to accomplish that.

Hope this helps!

- Clint

I am in the process of trying to integrate LCM in to my own personal assistant agent for its context management system. The main human facing agent will not be a coding agent so ill be modifying the system prompt and some other things quite heavily but core concepts of the system will be as the backbone. Now that I am paying around with it, I am hoping you can answer some questions. I notice that the system prompt of the agent mutates as local time is injected in to the system prompt itself. If that's whats happening, you are destroying any hopes of caching from the provider are you not? Am I reading this correctly or was this a deliberate choice for some reason... instead of appending at the end of the users turn like a system metadata info that way you preserve the head? Thanks.