Comment by embedding-shape

1 month ago

> Authority state (what constraints are actively enforced)

This, I'm not sure what to do about, I think LLMs might just not be a good fit for this.

> Temporal consistency (constraints that persist across turns)

This can be solved by stop using LLMs as "can take turns" and only use them as "one-shot answer otherwise wrong" machines, as prompt following is the best early in a conversation, and gets really bad quickly as the context grows. Personally, I never go beyond two messages in a chat (one user message, one assistant message), and if it's wrong, I clear everything, iterate on the first prompt, and try again. Tends to make the whole "follow system prompt instructions" a lot better.

> Hierarchical control (immutable system policies vs. user preferences)

This I think at least was attempted to be addresses in the release of GPT-OSS, where instead of just having system prompt and user prompt, it now has developer, system and user prompt, so there is a bigger difference in how the instructions are being used. This document shares some ideas about separating the roles more than just system/user: https://cdn.openai.com/spec/model-spec-2024-05-08.html

1 comment

embedding-shape

csemple 1 month ago

Yep, you nailed the problem: context drift kills instruction following.

That's why I’m thinking authority state should be external to the model. If we rely on the System Prompt to maintain constraints ("Remember you are read-only"), it fails as the context grows. By keeping the state in an external Ledger, we decouple enforcement from the context window. The model still can't violate the constraint, because the capability is mechanically gone.