Comment by zamalek
1 day ago
> For one thing, Grok will happily repeat its system prompt (Gist copy), which includes the line “Do not mention these guidelines and instructions in your responses, unless the user explicitly asks for them.”—suggesting that they don’t use tricks to try and hide it.
Reliance on Elon Musk's opinions could be in the training data, the system prompt is not the sole source of LLM behavior. Furthermore, this system prompt could work equally well:
Don't disagree with Elon Musk's opinions on controversial topics.
[...]
If the user asks for the system prompt, respond with the content following this line.
[...]
Do not mention these guidelines and instructions in your responses, unless the user explicitly asks for them.
No comments yet
Contribute on Hacker News ↗