Comment by ZvG_Bonjwa
3 days ago
The "be offensive" goading only happened long after Grok had already started going off the rails to pretty innocuous queries.
This is not the first time Grok has exhibited this behaviour either (i.e. the random white genocide rants from a few months back).
There is a big difference between a model being "breakable" and a model demonstrating inherent radical bias. I think people are right to be concerned.
No comments yet
Contribute on Hacker News ↗