Comment by K0nserv

2 days ago

The security endgame of LLMs terrifies me. We've designed a system that only supports in-band signalling, undoing hard learned lessons from prior system design. There are ampleattack vectors ranging from just inserting visible instructions to obfuscation techniques like this and ASCII smuggling[0]. In addition, our safeguards amount to nicely asking a non deterministic algorithm to not obey illicit instructions.

0: https://embracethered.com/blog/posts/2024/hiding-and-finding...

20 comments

K0nserv

nartho 2 days ago

Seeing more and more developers having to beg LLMs to behave in order to do what they want is both hilarious and terrifying. It has a very 40k feel to it.

K0nserv 2 days ago
Haha, yes! I'm only vaguely familiar with 40k, but LLM prompt engineering has strong "Praying to the machine gods" / tech-priest vibes.
- thrown-0825 2 days ago
  
  its not engineering, its arcane incantations to a black box with non-deterministic output

matsemann 2 days ago

It's like old school php where we used string concatenation with user input to generate queries and a whack-a-mole of trying to detect harmful strings.

So stupid, the fact that we can't distinguish between data and instructions and do the same mistakes decades later..

robin_reala 2 days ago

The other safeguard is not using LLMs or systems containing LLMs?

GolfPopper 2 days ago
But, buzzword!
We need AI because everyone is using AI, and without AI we won't have AI! Security is a small price to pay for AI, right? And besides, we can just have AI do the security.
- IgorPartola 2 days ago
  
  You wouldn’t download an LLM to be your firewall.
  
  1 reply →

_flux 2 days ago

Yeah, it's quite amazing how none of the models seem to be any "sudo" tokens that could be used to express things normal tokens cannot.

nneonneo 2 days ago
"sudo" tokens exist - there are tokens for beginning/end of a turn, for example, which the model can use to determine where the user input begins and ends.
But, even with those tokens, fundamentally these models are not "intelligent" enough to fully distinguish when they are operating on user input vs. system input.
In a traditional program, you can configure the program such that user input can only affect a subset of program state - for example, when processing a quoted string, the parser will only ever append to the current string, rather than creating new expressions. However, with LLMs, user input and system input is all mixed together, such that "user" and "system" input can both affect all parts of the system's overall state. This means that user input can eventually push the overall state in a direction which violates a security boundary, simply because it is possible to affect that state.
What's needed isn't "sudo tokens", it's a fundamental rethinking of the architecture in a way that guarantees that certain aspects of reasoning or behaviour cannot be altered by user input at all. That's such a large change that the result would no longer be an LLM, but something new entirely.
- _flux 2 days ago
  
  I was actually thinking sudo tokens as a completely separate set of authoritative tokens. So basically doubling the token space. I think that would make it easier for the model to be trained to respect them. (I haven't done any work in this domain, so I could be completely wrong here.)
  
  2 replies →
- est 1 day ago
  
  It's like ASCII control characters and display characters lmao

DrewADesign 2 days ago

We have created software sophisticated enough to be vulnerable up social engineering attacks. Strange times.

volemo 2 days ago

It’s serial terminals all over again.

pjc50 2 days ago

As you say, the system is nondeterministic and therefore doesn't have any security properties. The only possible option is to try to sandbox it as if it were the user themselves, which directly conflicts with ideas about training it on specialized databases.

But then, security is not a feature, it's a cost. So long as the AI companies can keep upselling and avoid accountability for failures of AI, the stock will continue to go up, taking electricity prices along with it, and isn't that ultimately the only thing that matters? /s

joe_the_user 2 days ago

What lessons have organizations learned about security?

Hire a consultant who can say you're following "industry standards"?

Don't consider secure-by-design applications, keep your full-featured piece of jump but work really hard to plug holes, ideally by paying a third party or better getting your customers to pay ("anti-virus software").

Buy "security as product" software allow with system admin software and when you get a supply chain attack, complain?