Comment by Havoc 4 days ago There’s no way that wasn’t specifically prompted. 5 comments Havoc Reply dmix 4 days ago The system prompt for Grok on Twitter is open source AFAIK.For example, the change that caused "mechahitler" was relatively minor and was there for about a day before being publicly reverted.https://github.com/xai-org/grok-prompts/commit/c5de4a14feb50... orbital-decay 4 days ago That doesn't mean there are no private injections. Which is not uncommon, for example claude.ai system prompts are public, but Claude also has hidden dynamic prompt injections, and a ton of other semi-black box machinery surrounding the model. Topfi 3 days ago Sorry, but can you point me to what part of the system prompt here would/could be responsible for causing MechaHitler?I have yet to see anything in the prompt they claim to have been using that would lead to such output from models by Google, OpenAI or Anthropic. bugglebeetle 4 days ago To be fair, it could’ve been post-trained into the model as well… dialup_sounds 4 days ago Having seen Musk fandom, every unhinged Grok claim has a good chance of having actually been written by a human somewhere in its training data.
dmix 4 days ago The system prompt for Grok on Twitter is open source AFAIK.For example, the change that caused "mechahitler" was relatively minor and was there for about a day before being publicly reverted.https://github.com/xai-org/grok-prompts/commit/c5de4a14feb50... orbital-decay 4 days ago That doesn't mean there are no private injections. Which is not uncommon, for example claude.ai system prompts are public, but Claude also has hidden dynamic prompt injections, and a ton of other semi-black box machinery surrounding the model. Topfi 3 days ago Sorry, but can you point me to what part of the system prompt here would/could be responsible for causing MechaHitler?I have yet to see anything in the prompt they claim to have been using that would lead to such output from models by Google, OpenAI or Anthropic.
orbital-decay 4 days ago That doesn't mean there are no private injections. Which is not uncommon, for example claude.ai system prompts are public, but Claude also has hidden dynamic prompt injections, and a ton of other semi-black box machinery surrounding the model.
Topfi 3 days ago Sorry, but can you point me to what part of the system prompt here would/could be responsible for causing MechaHitler?I have yet to see anything in the prompt they claim to have been using that would lead to such output from models by Google, OpenAI or Anthropic.
dialup_sounds 4 days ago Having seen Musk fandom, every unhinged Grok claim has a good chance of having actually been written by a human somewhere in its training data.
The system prompt for Grok on Twitter is open source AFAIK.
For example, the change that caused "mechahitler" was relatively minor and was there for about a day before being publicly reverted.
https://github.com/xai-org/grok-prompts/commit/c5de4a14feb50...
That doesn't mean there are no private injections. Which is not uncommon, for example claude.ai system prompts are public, but Claude also has hidden dynamic prompt injections, and a ton of other semi-black box machinery surrounding the model.
Sorry, but can you point me to what part of the system prompt here would/could be responsible for causing MechaHitler?
I have yet to see anything in the prompt they claim to have been using that would lead to such output from models by Google, OpenAI or Anthropic.
To be fair, it could’ve been post-trained into the model as well…
Having seen Musk fandom, every unhinged Grok claim has a good chance of having actually been written by a human somewhere in its training data.