← Back to context

Comment by SwellJoe

11 hours ago

Opus in recent versions is fine beyond 100k, but I usually do try to keep it under 200k.

But, this is also why so-called "memory" systems are usually a mistake that make the models dumber. They don't have memory, they only have context, and every irrelevant fact you shove into the context is less context for the problem. Less distractions, better results.

The way to have the agent remember things is to have it document its work, like a human developer would do if they wanted their project to be friendly to other developers working on it. Good developer docs with an index page and a good plan with checklists, in concise Markdown files, checked in to the repo is the ideal memory for models and the ideal docs you need to figure out WTF the model has been up to. Helps with code review, too, whether by humans or another model. There's no down side.

At least for me, Opus keeps writing stuff to memories, only to consistently forget checking those memories before doing the same mistake again. This ("remember to check memories!") is of course then again written as a memory... Clearly not a very well working system, yep.

  • Yeah, I see it write stuff to memory pretty regularly, maybe it works sometimes, but for things I want it to stop doing or always do, I make it impossible to do otherwise via lint or some style enforcement, or via a test that fails if code shows up that violates the constraint.

    But, it does a good job following existing conventions in a codebase, as long as they're really consistent. So the more actively you enforce that consistency the more likely it is to do the right thing without memories or prompting.

    I don't like "never do" or "always do" type rules in AGENTS.md or in memory, as it often over-interprets them and ties itself in knots trying to satisfy an impossible set of goals.

  • In my own multi agent framework I use cheap models to check the responses of the expensive models, as well as using multiple expensive models adversarially in debate. The cheap models are great at spotting eg the model getting stuck in the alternate between two broken ideas or not following code conventions or missing a step in the skill and so on. I’m currently working on making them detect user corrections and police that going forward to intervene when the expensive models forget the thing you just corrected them about etc.

  • I've explicitly banned Opus from creating memories unprompted, as it would often save info that's incorrect and which would then be propagated to future sessions until caught. Ugh x 10.