Comment by xpe

6 months ago

There’s so much confusion here. Nothing in the press release should be construed to imply that a model has sentience, can feel pain, or has moral value.

When AI researchers say e.g. “the model is lying” or “the model is distressed” it is just shorthand for what the words signify in a broader sense. This is common usage in AI safety research.

Yes, this usage might be taken the wrong way. But still these kinds of things need to be communicated. So it is a tough tradeoff between brevity and precision.

4 comments

xpe

orbital-decay 6 months ago

No, the article is pretty unambiguous, they care about Claude in it, and only mention users tangentially. By model welfare they literally mean model welfare. It's not new. Read another article they link: https://www.anthropic.com/research/exploring-model-welfare

xpe 6 months ago

?! Your interpretation is inconsistent with the article you linked!
> Should we be concerned about model welfare, too? … This is an open question, and one that’s both philosophically and scientifically difficult.
> For now, we remain deeply uncertain about many of the questions that are relevant to model welfare.
They are saying they are researching the topic; they explicitly say they don’t know the answer yet.
They care about finding the answer. If the answer is e.g. “Claude can feel pain and/or is sentient” then we’re in a different ball game.

andrewflnr 6 months ago

They make a big show of being "unsure" about the model having a moral status, and then describe a bunch of actions they took that only make sense if the model has moral status. Actions speak louder than words. This very predictably, by obvious means, creates the impression of believing the model probably has moral status. If Anthropic really wants to tell us they don't believe their model can feel pain, etc, they're either delusional or dishonest.

xpe 6 months ago

> They make a big show of being "unsure" about the model having a moral status, and then describe a bunch of actions they took that only make sense if the model has moral status.
I think this is uncharitable; i.e. overlooking other plausible interpretations.
>> We remain highly uncertain about the potential moral status of Claude and other LLMs, now or in the future. However, we take the issue seriously, and alongside our research program we’re working to identify and implement low-cost interventions to mitigate risks to model welfare, in case such welfare is possible.
I don’t see contradiction or duplicity in the article. Deciding to allow a model to end a conversation is “low cost” and consistent with caring about both (1) the model’s preferences (in case this matters now or in the future) and (2) the impacts of the model on humans.
Also, there may be an element of Pascal‘s Wager in saying “we take the issue seriously”.