Comment by irthomasthomas
17 hours ago
3.5 Instruction Hierarchy
The deployment of these models in the API allows developers to specify a custom developer message that is included with every prompt from one of their end users. This could potentially allow developers to circumvent system message guardrails if not handled properly. Similarly, end users may try to circumvent system or developer message guidelines.
Mitigations
To mitigate this issue, we teach models to adhere to an Instruction Hierarchy[2]. At a high level, we have three classifications of messages sent to the models: system messages, developer messages, and user messages. We test that models follow the instructions in the system message over developer messages, and instructions in developer messages over user messages.
Is this what you meant? I can see that this is part of the mechanism, I can't see where it states that openai will inject their own instructions.
No comments yet
Contribute on Hacker News ↗