Comment by Havoc

3 months ago

There’s no way that wasn’t specifically prompted.

5 comments

Havoc

The system prompt for Grok on Twitter is open source AFAIK.

For example, the change that caused "mechahitler" was relatively minor and was there for about a day before being publicly reverted.

orbital-decay 3 months ago

That doesn't mean there are no private injections. Which is not uncommon, for example claude.ai system prompts are public, but Claude also has hidden dynamic prompt injections, and a ton of other semi-black box machinery surrounding the model.
Topfi 3 months ago

Sorry, but can you point me to what part of the system prompt here would/could be responsible for causing MechaHitler?
I have yet to see anything in the prompt they claim to have been using that would lead to such output from models by Google, OpenAI or Anthropic.

To be fair, it could’ve been post-trained into the model as well…

Having seen Musk fandom, every unhinged Grok claim has a good chance of having actually been written by a human somewhere in its training data.