← Back to context

Comment by dmix

4 days ago

The system prompt for Grok on Twitter is open source AFAIK.

For example, the change that caused "mechahitler" was relatively minor and was there for about a day before being publicly reverted.

https://github.com/xai-org/grok-prompts/commit/c5de4a14feb50...

That doesn't mean there are no private injections. Which is not uncommon, for example claude.ai system prompts are public, but Claude also has hidden dynamic prompt injections, and a ton of other semi-black box machinery surrounding the model.

Sorry, but can you point me to what part of the system prompt here would/could be responsible for causing MechaHitler?

I have yet to see anything in the prompt they claim to have been using that would lead to such output from models by Google, OpenAI or Anthropic.