Comment by cluckindan
2 days ago
Perhaps the Grok system prompt includes instructions to answer with another ”system prompt” when users try to ask for its system prompt. It would explain why it gives it away so easily.
2 days ago
Perhaps the Grok system prompt includes instructions to answer with another ”system prompt” when users try to ask for its system prompt. It would explain why it gives it away so easily.
It is published on GitHub by xAI. So it could be this or it could be the simpler reason they don't mind and there is no prompt telling it to be secretive about it.
Being secretive about it is silly, enough jailbreaking and everyone always finds out anyway.
it's been proven that github doesn't have the latest system prompts for grok
They haven't shared the Grok 4 system prompts there, and those differ from the Grok 3 ones that they previously shared.
https://github.com/xai-org/grok-prompts/commits/main/ shows last update 3 days ago.
That would make Grok the only model capable of protecting its real system prompt from leaking?
Well, for this version people have only been trying for a day or so.
Providing a fake system prompt would make such jailbreaking very unlikely to succeed unless the jailbreak prompt explicitly accounts for that particular instruction.
Or it was trained to be aligned with Musk by receiving higher rewards during reinforcement learning steps for its reasoning.
I'm almost 100% that this is the case. Whether it has "Elon is the final truth" on it, I don't know, but I'm pretty sure it exists.