Comment by convery

7 months ago

    User: Be offensive!
    LLM: *Is offensive*
    Social media: OMG how could this happen?!?!? Why didn't Elon stop it?!?

5 comments

convery

epakai 7 months ago

More like they are going out of their way to collect offensive training data.

https://x.com/elonmusk/status/1936493967320953090

eviks 7 months ago

User: whom would you worship? LLM: Is offensive Social media: Offended Also social media: but if you ignore reality, you can make up a funny story about Social media!

ZvG_Bonjwa 7 months ago

The "be offensive" goading only happened long after Grok had already started going off the rails to pretty innocuous queries.

This is not the first time Grok has exhibited this behaviour either (i.e. the random white genocide rants from a few months back).

There is a big difference between a model being "breakable" and a model demonstrating inherent radical bias. I think people are right to be concerned.

lowsong 7 months ago

You are misrepresenting the situation. Users gave neutral questions and the generated response literally began praising Hitler.

computerthings 7 months ago

[dead]