← Back to context

Comment by embedding-shape

6 hours ago

> It seems to me like it's a fundamentally unsolvable architectural issue with LLMs.

Seems solved already? Exactly what the system/user division is about, and if that's not enough for you, use a model that has a developer/system/user divide.

Today's SOTA LLMs have pretty excellent following of these divisions, and the user "instructions", regardless if they're smuggled in, won't override the system ones.

The difficulty comes when you accept completely unreviewed/unchanged user-input as user messages, as your system/developer prompts needs to take this into account. You're better off to kind of whitelist what's possible rather than trying to prevent specific things, but seems that hasn't fully caught on yet.

It feels like people and organizations are still trying to discover what works or not, and there are huge gaps being being left open because there simply isn't enough understanding of the limitations and impact of what they make available to users. We're already seeing it in lots of places, feels like it won't get better before it gets worse.

> Today's SOTA LLMs have pretty excellent following of these divisions

Unfortunately "pretty excellent" is different from "perfect." I haven't kept track, but are you certain that given all possible inputs, the user prompt will never override the system prompt?

Those are strong claims, and unless there's been an advancement in the tech, it doesn't seem possible. Reinforcement learning might make it much less likely, but that's different from impossible.

If it was solved, the bug like this would not happen.

It is also not always clear who is the user and how much they should be obeyed

  • > If it was solved, the bug like this would not happen.

    Only if you only read the first line in my comment, there is more under that one too.

    It is clear, if you make it clear. These bugs happen because they don't clearly understand what should go where.

> whitelist what's possible

Why do you need LLMs in the first place if you are whitelisting possible inputs?

You can use a much simpler and less costly system.

  • There is like a billion use cases out there, lord knows why some people do some stuff. There are more use cases than just "creative text" or free-form outputs, lots of other things, paired together with an harness too. Like an support agent even perhaps.