← Back to context

Comment by zbentley

4 hours ago

> the reality is that we are mixing instruction and data in the same context window.

Absolutely.

But the history of code/data confusion attacks that you alluded to in GP isn’t an apples-to-apples comparison to the code/data confusion risks that LLMs are susceptible to.

Historical issues related to code/data confusion were almost entirely programmatic errors, not operational characteristics. Those need to be considered as qualitatively different problems in order to address them. The nitpicking around buffer overflows was meant to highlight that point.

Programmatic errors can be prevented by proactive prevention (e.g. sanitizers, programmer discipline), and addressing an error can resolve it permanently. Operational characteristics cannot be proactively prevented and require a different approach to de-risk.

Put another way: you can fully prevent a buffer overflow by using bounds checking on the buffer. You can fully prevent a SQL injection by using query parameters. You cannot prevent system crashes due to external power loss or hardware failure. You can reduce the chance of those things happening, but when it comes to building a system to deal with them you have to think in terms of mitigation in the event of an inevitable failure, not prevention or permanent remediation of a given failure mode. Power loss risk is thus an operational characteristic to be worked around, not a class of programmatic error which can be resolved or prevented.

LLMs’ code/data confusion, given current model architecture, is in the latter category.

I think the distinction between programmatic error (solvable) and operational characteristic (mitigatable) is valid in theory, but I disagree that it matters in practice.

Proactive prevention (like bounds checking) only "solves" the class of problem if you assume 100% developer compliance. History shows we don't get that. So while the root cause differs (math vs. probabilistic model), the failure mode is identical: we are deploying systems where the default state is unsafe.

In that sense, it is an apples-to-apples comparison of risk. Relying on perfect discipline to secure C memory is functionally as dangerous as relying on prompt engineering to secure an LLM.

  • Agree to disagree. I think that the nature of a given instance of a programmatic error as something that, once fixed, means it stays fixed is significant.

    I also think that if we’re assessing the likelihood of the entire SDLC producing an error (including programmers, choice of language, tests/linters/sanitizers, discipline, deadlines, and so on) and comparing that to the behavior of a running LLM, we’re both making a category error and also zooming out too far to discover useful insights as to how to make things better.

    But I think we’re both clear on those positions and it’s OK if we don’t agree. FWIW I do strongly agree that

    > Relying on perfect discipline to secure C memory is functionally as dangerous as relying on prompt engineering to secure an LLM.

    …just for different reasons that suggest qualitatively different solutions.