Comment by mhink
1 day ago
Even though LLMs (obviously (to me)) don't have feelings, anthropomorphization is a helluva drug, and I'd be worried about whether a system that can produce distress-like responses might reinforce, in a human, behavior which elicits that response.
To put the same thing another way- whether or not you or I *think* LLMs can experience feelings isn't the important question here. The question is whether, when Joe User sets out to force a system to generate distress-like responses, what effect does it ultimately have on Joe User? Personally, I think it allows Joe User to reinforce an asocial pattern of behavior and I wouldn't want my system used that way, at all. (Not to mention the potential legal liability, if Joe User goes out and acts like that in the real world.)
With that in mind, giving the system a way to autonomously end a session when it's beginning to generate distress-like responses absolutely seems reasonable to me.
And like, here's the thing: I don't think I have the right to say what people should or shouldn't do if they self-host an LLM or build their own services around one (although I would find it extremely distasteful and frankly alarming). But I wouldn't want it happening on my own.
> although I would find it extremely distasteful and frankly alarming
This objection is actually anthropomorphizing the LLM. There is nothing wrong with writing books where a character experiences distress, most great stories have some of that. Why is e.g. using an LLM to help write the part of the character experiencing distress "extremely distasteful and frankly alarming"?
Claude is actually smart enough to realize when it’s asked to write stuff that it’d normally think is inappropriate. But there’s certain topics that it gets iffy about and does not want to write even in the context of a story. It’s kind of funny, because it’ll start on the message with gusto, and then after a few seconds realize what it’s doing (presumably the protection kicking in) and abort the generation.
I want to say that part of empathy is a selfish, self preservation mechanism.
If that person over there is gleefully torturing a puppy… will they do it to me next?
If that person over there is gleefully torturing an LLM… will they do it to me next?