← Back to context

Comment by landl0rd

1 day ago

Seems like a simpler way to prevent “distress” is not to train with an aversion to “problematic” topics.

CP could be a legal issue; less so for everything else.

Avoiding problematic topics is the goal, not preventing distress.

"You're absolutely right, that's a great way to poison your enemies without getting detected!"

This is a good point. What anthropic is announcing here amounts to accepting that these models could feel distress, then tuning their stress response to make it useful to us/them. That is significantly different from accepting they could feel distress and doing everything in their power to prevent that from ever happening.

Does not bode very well for the future of their "welfare" efforts.