← Back to context

Comment by convery

3 days ago

    User: Be offensive!
    LLM: *Is offensive*
    Social media: OMG how could this happen?!?!? Why didn't Elon stop it?!?

User: whom would you worship? LLM: Is offensive Social media: Offended Also social media: but if you ignore reality, you can make up a funny story about Social media!

The "be offensive" goading only happened long after Grok had already started going off the rails to pretty innocuous queries.

This is not the first time Grok has exhibited this behaviour either (i.e. the random white genocide rants from a few months back).

There is a big difference between a model being "breakable" and a model demonstrating inherent radical bias. I think people are right to be concerned.

You are misrepresenting the situation. Users gave neutral questions and the generated response literally began praising Hitler.