Comment by a_wild_dandan
2 years ago
I understand (and could use) Anthropic’s “super safe model”, if Anthropic ever produces one!
To me, the model isn’t “safe.” Even in benign contexts it can erratically be deceptive, argumentative, obtuse, presumptuous, and may gaslight or lie to you. Those are hallmarks of a toxic relationship and the antithesis of safety, to me!
Rather than being inclusive, open minded, tolerant of others' opinions, and striving to be helpful...it's quickly judgemental, bigoted, dogmatic, and recalcitrant. Not always, or even more usual than not! But frequently enough in inappropriate contexts for legitimate concern.
A few bad experiences can make Claude feel more like a controlling parent than a helpful assistant. However they're doing RLHF, it feels inferior to other models, including models without the alleged "safety" at all.
Do you have any examples of this?
I do. When I asked about a type of medicine used by women for improve chances of fertility, Claude lectured and then denied providing basic pharmacological information, saying my partner must go to her gyno. When I said that doctor had issued a prescription and we were querying about side effects, Claude said it was irrelevant that we had a prescription and that issues related to reproductive health were controversial and outside its scope to discuss.
[flagged]