← Back to context

Comment by Havoc

4 days ago

There’s no way that wasn’t specifically prompted.

The system prompt for Grok on Twitter is open source AFAIK.

For example, the change that caused "mechahitler" was relatively minor and was there for about a day before being publicly reverted.

https://github.com/xai-org/grok-prompts/commit/c5de4a14feb50...

  • That doesn't mean there are no private injections. Which is not uncommon, for example claude.ai system prompts are public, but Claude also has hidden dynamic prompt injections, and a ton of other semi-black box machinery surrounding the model.

  • Sorry, but can you point me to what part of the system prompt here would/could be responsible for causing MechaHitler?

    I have yet to see anything in the prompt they claim to have been using that would lead to such output from models by Google, OpenAI or Anthropic.

Having seen Musk fandom, every unhinged Grok claim has a good chance of having actually been written by a human somewhere in its training data.