Comment by einarfd
1 day ago
This seems fine to me.
Having these models terminating chats where the user persist in trying to get sexual content with minors, or help with information on doing large scale violence. Won't be a problem for me, and it's also something I'm fine with no one getting help with.
Some might be worried, that they will refuse less problematic request, and that might happen. But so far my personal experience is that I hardly ever get refusals. Maybe that's justs me being boring, but that does make me not worried for refusals.
The model welfare I'm more sceptical to. I don't think we are the point when the "distress" the model show, is something to take seriously. But on the other hand, I could be wrong, and allowing the model to stop the chat, after saying no a few times. What's the problem with that? If nothing else it saves some wasted compute.
> Some might be worried, that they will refuse less problematic request, and that might happen. But so far my personal experience is that I hardly ever get refusals.
My experience using it from Cursor is I get refusals all the time with their existing content policy out, for stuff that is the world's most mundane B2B back office business software CRUD requests.
Claude will balk at far more innocent things though. It is an extremely censored model, the most censored one among SOTA closed ones.
If you are a materialist like me, then even the human brain is just the result of the law of physics. Ok, so what is distress to a human? You might define it as a certain set of physiological changes.
Lots of organisms can feel pain and show signs of distress; even ones much less complex than us.
The question of moral worth is ultimately decided by people and culture. In the future, some kinds of man made devices might be given moral value. There are lots of ways this could happen. (Or not.)
It could even just be a shorthand for property rights… here is what I mean. Imagine that I delegate a task to my agent, Abe. Let’s say some human, Hank, interacting with Abe uses abusive language. Let’s say this has a way of negatively influencing future behavior of the agent. So naturally, I don’t want people damaging my property (Abe), because I would have to e.g. filter its memory and remove the bad behaviors resulting from Hank, which costs me time and resources. So I set up certain agreements about ways that people interact with it. These are ultimately backed by the rule of law. At some level of abstraction, this might resemble e.g. animal cruelty laws.