Comment by Aeolun

6 months ago

These are conversations the model has been trained to find distressing.

I think there is a difference.

8 comments

Aeolun

But is there really? That's it's underlying world view, these models do have preferences. In the same way humans have unconscious preferences, we can find excuses to explain it after the fact and make it logical but our fundamental model from years of training introduce underlying preferences.

michaelmrose 6 months ago
What makes you say it has preferences without any meaningful persistent model of self or anything else?
- KoolKat23 6 months ago
  
  The conversation chain can count as persistent, but this doesn't impact preference though. Give the model an ambiguous request, it's output will fill the gaps, if this is consistent enough, it can be regarded as its "preference".
  
  4 replies →