Comment by joshstrange

1 day ago

> I think there is a good chance this behavior is unintended!

Ehh, given the person we are talking about (Elon) I think that's a little naive. They wouldn't need to add it in the system prompt, they could have just fine-tuned it and rewarded it when it tried to find Elon's opinion. He strikes me as the type of person who would absolutely do that given stories about him manipulating Twitter to "fix" his dropping engagement numbers.

This isn't fringe/conspiracy territory, it would be par for the course IMHO.

5 comments

joshstrange

simonw 1 day ago

If I was Elon and I decided that Grok should search my tweets any time it needs to answer something controversial, I would also make sure it didn't say "Searching X for from:elonmusk" right there in the UI every time it did that.

joshstrange 1 day ago
I don't want to be rude, I quite enjoy your work but:
If I was Elon and I decided that I wanted to go full fascist then I wouldn't do a nazi salute at the inauguration.
But I get what you are saying and you aren't wrong but also people can make mistakes/bugs, we might see Grok "stop" searching for that but who knows if it's just hidden or if it actually will stop doing it. Elon has just completely burned any "Here is an innocent explanation"-cred in my book, assuming the worst seems to be the safest course of action.
- simonw 1 day ago
  
  Personally I don't think "we trained our model to search for Elon's opinion on things even though we didn't mean to" is a particularly innocent explanation. It strikes at the heart of the credibility of the organization.
serf 1 day ago
you don't think a technical dev would let management foot-gun themselves like that with a stupid directive?
I do.
I don't have any sort of inkling that Musk has ever dog-fooded any single product he's been involved with. He can spout shit out about Grok all day in press interviews, I don't believe for a minute that he's ever used it or is even remotely familiar with how the UI/UX would work.
I do think that a dictator would instruct Dr Frankenstein to make his monster obey him (the dictator) at any costs, regardless of the dictator's biology/psychology skills.
- simonw 1 day ago
  
  I think it is possible that a developer, with or without Elon's direct instruction, decided to engineer Grok to search for Elon's tweets on controversial subjects and then either out of incompetence or malicious compliance set it up so those searches would be exposed in the UI.
  I also think it is possible that nobody specifically designed that behavior, and it instead emerged from the way the model was trained.
  My current intuition is that the second is more likely than the first.