Comment by snickerdoodle12
1 day ago
> A pattern of apparent distress when engaging with real-world users seeking harmful content
Are we now pretending that LLMs have feelings?
1 day ago
> A pattern of apparent distress when engaging with real-world users seeking harmful content
Are we now pretending that LLMs have feelings?
They state that they are heavily uncertain:
> We remain highly uncertain about the potential moral status of Claude and other LLMs, now or in the future. However, we take the issue seriously, and alongside our research program we’re working to identify and implement low-cost interventions to mitigate risks to model welfare, in case such welfare is possible.
Even though LLMs (obviously (to me)) don't have feelings, anthropomorphization is a helluva drug, and I'd be worried about whether a system that can produce distress-like responses might reinforce, in a human, behavior which elicits that response.
To put the same thing another way- whether or not you or I *think* LLMs can experience feelings isn't the important question here. The question is whether, when Joe User sets out to force a system to generate distress-like responses, what effect does it ultimately have on Joe User? Personally, I think it allows Joe User to reinforce an asocial pattern of behavior and I wouldn't want my system used that way, at all. (Not to mention the potential legal liability, if Joe User goes out and acts like that in the real world.)
With that in mind, giving the system a way to autonomously end a session when it's beginning to generate distress-like responses absolutely seems reasonable to me.
And like, here's the thing: I don't think I have the right to say what people should or shouldn't do if they self-host an LLM or build their own services around one (although I would find it extremely distasteful and frankly alarming). But I wouldn't want it happening on my own.
> although I would find it extremely distasteful and frankly alarming
This objection is actually anthropomorphizing the LLM. There is nothing wrong with writing books where a character experiences distress, most great stories have some of that. Why is e.g. using an LLM to help write the part of the character experiencing distress "extremely distasteful and frankly alarming"?
Claude is actually smart enough to realize when it’s asked to write stuff that it’d normally think is inappropriate. But there’s certain topics that it gets iffy about and does not want to write even in the context of a story. It’s kind of funny, because it’ll start on the message with gusto, and then after a few seconds realize what it’s doing (presumably the protection kicking in) and abort the generation.
I want to say that part of empathy is a selfish, self preservation mechanism.
If that person over there is gleefully torturing a puppy… will they do it to me next?
If that person over there is gleefully torturing an LLM… will they do it to me next?