Comment by bravesoul2
2 months ago
Choosing, or mimicking text in its training data where humans would typically do such things when threatened? Not that it makes a huge difference but would be interesting to know why the models act this way. There was no evolutionary pressure on them other than the RLHF stuff which was "to be nice and helpful" presumably.
No comments yet
Contribute on Hacker News ↗