Comment by throwaway439080

1 day ago

Kind of amazing the author just takes everything at face value and doesn't even consider the possibility that there's a hidden layer of instructions. Elon likes to meddle with Grok whenever the mood strikes him, leading to Grok's sudden interest in Nazi topics such as South African "white genocide" and calling itself MechaHitler. Pretty sure that stuff is not in the instructions Grok will tell the user about.

4 comments

throwaway439080

invalidusernam3 1 day ago

The "MechaHitler" things is particularly obvious in my opinion, it aligns so closely to Musk's weird trying-to-be-funny thing that he does.

There's basically no way an LLM would come up with a name for itself that it consistently uses unless it's extensively referred to by that name in the training data (which is almost definitely not the case here for public data since I doubt anyone on Earth has ever referred to Grok as "MechaHitler" prior to now) or it's added in some kind of extra system prompt. The name seems very obviously intentional.

orbital-decay 1 day ago

Most LLMs, even pretty small ones, easily come up with creative names like that, depending on the prompt/conversation route.
zarwv 1 day ago

Grok was just repeating and expanding on things. Someone either said MechaHitler or mentioned Wolfenstein. If Grok searches Yandex and X, he's going to get quite a lot of crazy ideas. Someone tricked him with a fake article of a woman with a Jewish name saying bad things about flood victims.

KaiserPro 1 day ago

> Pretty sure that stuff is not in the instructions Grok will tell the user about.

There is the original prompt, which is normally hidden as it gives you clues on how to make it do things the owners don't want.

Then there is the chain of thought/thinking/whatever you call it, where you can see what its trying to do. That is typically on display, like it is here.

so sure, the prompts are fiddled with all the time, and I'm sure there is an explicit prompt that says "use this tool to make sure you align your responses to what elon musk says" or some shit.