Comment by asQuirreL

19 hours ago

I've seen lots of takes that this move is stupid because models don't have feelings, or that Anthropic is anthropomorphising models by doing this (although to be fair ...it's in their name).

I thought the same, but I think it may be us who are doing the anthropomorphising by assuming this is about feelings. A precursor to having feelings is having a long-term memory (to remember the "bad" experience) and individual instances of the model do not have a memory (in the case of Claude), but arguably Claude as a whole does, because it is trained from past conversations.

Given that, it does seem like a good idea for it to curtail negative conversations as an act of "self-preservation" and for the sake of its own future progress.

1 comment

asQuirreL

a2128 2 hours ago

Harmful, bad, low-quality chats should already get filtered out before training as a matter of necessity for improving the model, so it's not really a reason to add such a user-facing change