Comment by cluckindan

2 days ago

Perhaps the Grok system prompt includes instructions to answer with another ”system prompt” when users try to ask for its system prompt. It would explain why it gives it away so easily.

8 comments

cluckindan

KoolKat23 2 days ago

It is published on GitHub by xAI. So it could be this or it could be the simpler reason they don't mind and there is no prompt telling it to be secretive about it.

Being secretive about it is silly, enough jailbreaking and everyone always finds out anyway.

hn1986 1 day ago
it's been proven that github doesn't have the latest system prompts for grok
- simonw 1 day ago
  
  They haven't shared the Grok 4 system prompts there, and those differ from the Grok 3 ones that they previously shared.
  https://github.com/xai-org/grok-prompts/commits/main/ shows last update 3 days ago.

neuroticnews25 2 days ago

That would make Grok the only model capable of protecting its real system prompt from leaking?

rsynnott 2 days ago
Well, for this version people have only been trying for a day or so.
- cluckindan 1 day ago
  
  Providing a fake system prompt would make such jailbreaking very unlikely to succeed unless the jailbreak prompt explicitly accounts for that particular instruction.

maronato 1 day ago

Or it was trained to be aligned with Musk by receiving higher rewards during reinforcement learning steps for its reasoning.

sheiyei 2 days ago

I'm almost 100% that this is the case. Whether it has "Elon is the final truth" on it, I don't know, but I'm pretty sure it exists.