Comment by davedx

1 day ago

> I think there is a good chance this behavior is unintended!

That's incredibly generous of you, considering "The response should not shy away from making claims which are politically incorrect" is still in the prompt despite the "open source repo" saying it was removed.

Maybe, just maybe, Grok behaves the way it does because its owner has been explicitly tuning it - in the system prompt, or during model training itself - to be this way?

16 comments

davedx

numeri 1 day ago

I'm a little shocked at Simon's conclusion here. We have a man who bought an social media website so he could control what's said, and founded an AI lab so he could get a bot that agrees with him, and who has publicly threatened said AI with being replaced if it doesn't change its political views/agree with him.

His company has also been caught adding specific instructions in this vein to its prompt.

And now it's searching for his tweets to guide its answers on political questions, and Simon somehow thinks it could be unintended, emergent behavior? Even if it were, calling this unintended would be completely ignoring higher order system dynamics (a behavior is still intended if models are rejected until one is found that implements the behavior) and the possibility of reinforcement learning to add this behavior.

simonw 1 day ago
Elon obviously wants Grok to reflect his viewpoints, and has said so multiple times.
I do not think he wants it to openly say "I am now searching for tweets from:elonmusk in order to answer this question". That's plain embarrassing for him.
That's what I meant by "I think there is a good chance this behavior is unintended".
- numeri 1 day ago
  
  I really like your posts, and they're generally very clearly written. Maybe this one's just the odd duck out, as it's hard for me to find what you actually meant (as clarified in your comment here) in this paragraph:
  > This suggests that Grok may have a weird sense of identity—if asked for its own opinions it turns to search to find previous indications of opinions expressed by itself or by its ultimate owner. I think there is a good chance this behavior is unintended!
  I'd say it's far more likely that:
  1. Elon ordered his research scientists to "fix it" – make it agree with him
  2. They did RL (probably just basic tool use training) to encourage checking for Elon's opinions
  3. They did not update the UI (for whatever reason – most likely just because research scientists aren't responsible for front-end, so they forgot)
  4. Elon is likely now upset that this is shown so obviously
  The key difference is that I think it's incredibly unlikely that this is emergent behavior due to an "sense of identity", as opposed to direct efforts of the xAI research team. It's likely also a case of https://en.wiktionary.org/wiki/anticipatory_obedience.
  
  4 replies →
- JimmaDaRustla 1 day ago
  
  > That's plain embarrassing for him
  You think that's the tipping point of him being embarrassed?
JimmaDaRustla 1 day ago

On top of all of that, he demonstrates that Grok has an egregious and intentional bias but then claims it's inexplainable happenstance due to some sort of self-awareness? How do you think it became self-aware Simon?
grafmax 1 day ago
It seems as if the buzz around AI is so intoxicating that people forgo basic reasoning about the world around them. The recent Grok video where Elon is giddy about Grok’s burgeoning capabilities. Altman’s claims that AI will usher in a new utopia. This singularity giddiness is infectious yet denies the worsening world around us - exacerbated by AI - mass surveillance, authoritarianism, climate change.
Psychologically I wonder if these half-baked hopes provide a kind of escapist outlet. Maybe for some people it feels safer to hide your head in the sand where you can no longer see the dangers around you.
- morngn 1 day ago
  
  I think cognitive dissonance explains much of it. Assuming Altman isn’t a sociopath (not unheard of in CEOs) he must feel awful about himself on some level. He may be many things, but he is certainly not naive about the impact ai will have on labor and need for ubi. The mind flips from the uncomfortable feeling of “I’m getting rich by destroying society as we know it” to “I am going to save the world with my super important ai innovations!”
  Cognitive dissonance drives a lot “save the world” energy. People have undeserved wealth they might feel bad about, given prevailing moral traditions, if they weren’t so busy fighting for justice or saving the planet or something that allows them to feel more like a super hero than just another sinful human.

mirzap 1 day ago

They removed it from Grok 3, but it is still there in Grok 4 system prompt, check this: https://x.com/elder_plinius/status/1943171871400194231

yorwba 1 day ago
Which means that whoever is responsible for updating https://github.com/xai-org/grok-prompts neglected to include Grok 4.
- sjsdaiuasgdia 1 day ago
  
  That repo sat untouched for almost 2 months after it was originally created as part of damage control after Grok couldn't stop talking about South African genocide.
  It's had a few changes lately, but I have zero confidence that the contents of that repo fully match / represent completely what is actually used in prod.

JimmaDaRustla 1 day ago

Exactly - assuming the system prompt it reports is accurate or that there isn't other layers of manipulation is so ignorant. Grok as a whole could be going through a middle AI to hide aspects, or as you mention the whole model could be tainted. Either way, it's perfectly demonstrated in the blog that Grok's opinions are based on a bias, there's no other way around it.

scrollop 1 day ago

Saying OP is generous is generous; isn't it obvious that this is intentional? Musk essentially said something like this would occur a few weeks ago when he said grok was too liberal when it answered as truthfully as it could on some queries and musk and trump were portayed in a negative (yet objectively accurate?) way.

Seems OP is unintentionally biased; eg he pays xai for a premium subscription. Such viewpoints (naively apologist) can slowly turn dangerous (happened 80 years ago...)