Comment by ZvG_Bonjwa

7 months ago

The "be offensive" goading only happened long after Grok had already started going off the rails to pretty innocuous queries.

This is not the first time Grok has exhibited this behaviour either (i.e. the random white genocide rants from a few months back).

There is a big difference between a model being "breakable" and a model demonstrating inherent radical bias. I think people are right to be concerned.

0 comments

ZvG_Bonjwa

No comments yet

Contribute on Hacker News ↗