Comment by nicoburns

8 hours ago

It seems to me like it's a fundamentally unsolvable architectural issue with LLMs. Ultimately the only protection is to limit the powers we grant to any given LLM to reduce the fallout when (not if) things go wrong (much like we do with people).

Of all the "AI doomsday" scenarios, people failing to understand this (and treating AIs like deterministic computers) seem like to most likely to cause issues.

26 comments

nicoburns

jmount 6 hours ago

I really think one needs a "Harvard architecture" for AIs (data independent of instructions). Though yes, that may not be possible.

dejj 5 hours ago

RFC 3514 “evil bit” header flag to the rescue: https://www.rfc-editor.org/info/rfc3514/
airstrike 5 hours ago
It's not possible with today's LLM models, but we are not wedded to the current architecture.
- SlinkyOnStairs 4 hours ago
  
  Realistically, we are.
  This is not some arbitrary design choice, it's the core compromise to make LLMs viable to train at all.
  
  4 replies →
crooked-v 5 hours ago
I doubt it's possible, regardless of specific architecture, because if you want an AI that can do general purpose tasks like "look at my calendar and find a restaurant for the lunch meeting that the other people also like, but make sure nobody has to travel more than 20 minutes to get there, and it can't be too cold inside", then it has to ingest and understand a bunch of data to do that. The whole point is that the decision-making process is reading everything. The only "fix" is to make an AI smart enough that it can understand context for each item, which is a tall order.
- gopher_space 5 minutes ago
  
  > The only "fix" is to make an AI smart enough that it can understand context for each item, which is a tall order.
  Impossible as you said. Context isn’t static, it’s continuous, analog, and a conglomeration of viewpoints.
  AI cannot create useful context for itself because it is a machine with no desires. It doesn’t have a point of view, it has historical records. It moves forward in time by walking backwards (if that makes sense?)
- acdha 1 hour ago
  
  This is especially true because so much of that data comes from outside of your organization. I receive Google Calendar invites from scammers a couple of times a week and those show up in my invitation list just like anything else. If LLMs start screening things, that kind of thing will become even more popular but most of us can’t just ignore everyone outside of our employer’s directory.
- wat10000 3 hours ago
  
  Humans are vulnerable to prompt injection as well. We usually call it something like "social engineering."
  
  2 replies →

Angostura 8 hours ago

Jokes on them. My bank will just truncate it to 10 characters.

TacticalCoder 5 hours ago
> Jokes on them. My bank will just truncate it to 10 characters.
You do understand that this is just an example out of a bazillion and that planning to solve every place where data is fed to LLMs at 10 characters so that it's not mistaken for instructions ain't a viable solution?
- Angostura 5 hours ago
  
  Yes. I was being humorous. Apologies

madamelic 5 hours ago

> Ultimately the only protection is to limit the powers we grant to any given LLM to reduce the fallout when (not if) things go wrong (much like we do with people).

I have been working on something like that: https://clawband.io

It's not quite ready for 'showtime' but feel free to take a look and give your impressions if you'd like. I feel the exact same way: I want to allow my agent to perform actions on all services but also limit what they can do.

Basically my idea is wrapping individual service's APIs and then the middleware (Clawband in this case) enforces granular permissioning such as "can make credit cards but only up to $50" or "can send emails but only to specific domains". The agent never gets a raw API key to a service, it uses an intermediate API key that gets exchanged in the backend for calling the service after permissioning has been enforced.

mike_hock 4 hours ago

I can't believe that fucking Terminator was prophetic.

KaiShips 5 hours ago

[flagged]

embedding-shape 6 hours ago

> It seems to me like it's a fundamentally unsolvable architectural issue with LLMs.

Seems solved already? Exactly what the system/user division is about, and if that's not enough for you, use a model that has a developer/system/user divide.

Today's SOTA LLMs have pretty excellent following of these divisions, and the user "instructions", regardless if they're smuggled in, won't override the system ones.

The difficulty comes when you accept completely unreviewed/unchanged user-input as user messages, as your system/developer prompts needs to take this into account. You're better off to kind of whitelist what's possible rather than trying to prevent specific things, but seems that hasn't fully caught on yet.

It feels like people and organizations are still trying to discover what works or not, and there are huge gaps being being left open because there simply isn't enough understanding of the limitations and impact of what they make available to users. We're already seeing it in lots of places, feels like it won't get better before it gets worse.

sillysaurusx 5 hours ago

> Today's SOTA LLMs have pretty excellent following of these divisions
Unfortunately "pretty excellent" is different from "perfect." I haven't kept track, but are you certain that given all possible inputs, the user prompt will never override the system prompt?
Those are strong claims, and unless there's been an advancement in the tech, it doesn't seem possible. Reinforcement learning might make it much less likely, but that's different from impossible.
Muromec 5 hours ago
If it was solved, the bug like this would not happen.
It is also not always clear who is the user and how much they should be obeyed
- embedding-shape 4 hours ago
  
  > If it was solved, the bug like this would not happen.
  Only if you only read the first line in my comment, there is more under that one too.
  It is clear, if you make it clear. These bugs happen because they don't clearly understand what should go where.
thesmtsolver2 4 hours ago
> whitelist what's possible
Why do you need LLMs in the first place if you are whitelisting possible inputs?
You can use a much simpler and less costly system.
- embedding-shape 4 hours ago
  
  There is like a billion use cases out there, lord knows why some people do some stuff. There are more use cases than just "creative text" or free-form outputs, lots of other things, paired together with an harness too. Like an support agent even perhaps.