← Back to context

Comment by gpm

19 hours ago

An eager intern can remember things you tell beyond that which would fit in an hours conversation.

A disgruntled employee definitely remembers things beyond that.

These are a fundamentally different sort of interaction.

Agreed, but the point is, if your system is resilient against an eager intern who has not had the necessary guidance, or an actively hostile disgruntled employee, that inherently restricts the harm an LLM can do.

I'm not making the case that LLMs learn like people. I'm making the case that if your system is hardened against things people can do (which it should be, beyond a certain scale) it is also similarly hardened against LLMs.

The big difference is that LLMs are probably a LOT more capable than either of those at overcoming barriers. Probably a good reason to harden systems even more.

  • The difference makes the necessary barriers different.

    There's benefit to letting a human make and learn from (minor) mistakes. There is no such benefit accrued from the LLM because it is structurally unable to.

    There's the potential of malice, not just mistakes, from the human. If you carefully control the LLMs context there is no such potential for the LLM because it restarts from the same non-malicious state every context window.

    There's the potential of information leakage through the human, because they retain their memories when they go home at night, and when they quit and go to another job. You can carefully control the outputs of the LLM so there is simply no mechanism for information to leak.

    If a human is convinced to betray the company, you can punish the human, for whatever that's worth (I think quite a lot in some peoples opinion, not sure I agree). There is simply no way to punish an LLM - it isn't even clear what that would mean punishing. The weights file? The GPU that ran the weights file?

    And on the "controls" front (but unrelated to the above note about memory) LLMs are fundamentally only able to manipulate whatever computers you hook them up to, while people are agents in a physical world and able to go physically do all sorts of things without your assistance. The nature of the necessary controls end up being fundamentally different.

    • A lot of 'agentic harnesses' actually do have limited memory functions these days. In the simplest form, the LLM can write to a file like memory.md or claude.md or agent.md , and this gets tacked on to their system prompt going forwards. This does help a bit at least.

      Rather more sophisticated Retrieval Augmented Generation (RAG) systems exist.

      At the moment it's very mixed bag, with some frameworks and harnesses giving very minimal memory, while others use hybrid vector/full text lookups, diverse data structures and more. It's like the cambrian explosion atm.

      Thing is, this is probabilistic, and the influence of these memories weakens as your context length grows. If you don't manage context properly, (and sometimes even when you think you do), the LLM can blow past in-context restraints, since they are not 100% binding. That's why you still need mechanical safeguards (eg. scoped credentials, isolated environments) underneath.

You can easily persist agent memories in a markdown file though.

  • And the memento guy had tattoos of key information. That didn’t make it so he didn’t have memory loss.

    • Pretty good metaphor.

      Limited space to work with, highly context dependent and likely to get confused as you cover more surface area.

  • and you'll blow the context over time and send to the LLM sanitorium. It doesn't fit like the human brain can.

    If a junior fucks production that will have extroadinary weight because it appreciates the severity, the social shame and they will have nightmares about it. If you write some negative prompt to "not destroy production" then you also need to define some sort of non-existing watertight memory weighting system and specify it in great detail. Otherwise the LLM will treat that command only as important as the last negative prompt you typed in or ignore it when it conflicts with a more recent command.

    • > and you'll blow the context over time and send to the LLM sanitorium. It doesn't fit like the human brain can.

      The LLM did have this capability at training time, but weights are frozen at inference time. This is a big weakness in current transformer architectures.

  • Yup, and the agent will happily ignore any and all markdown files, and will say "oops, it was in the memory, will not do it again", and will do it again.

    Humans actually learn. And if they don't, they are fired.

    • To me it sounds like a tooling problem. OP seems to be trying to use probabilistic text systems as if they enforce rules, but rule enforcement should really live outside the model. My sense is that there was a failure to verify the agent's intent.

      The tooling that invokes the model should really define some kind of guardrails. I feel like there's an analogy to be had here with the difference between an untyped program and a typed program. The typed program has external guardrails that get checked by an external system (the compiler's type checker).

      2 replies →