Comment by bondarchuk

6 months ago

This is a good point. What anthropic is announcing here amounts to accepting that these models could feel distress, then tuning their stress response to make it useful to us/them. That is significantly different from accepting they could feel distress and doing everything in their power to prevent that from ever happening.

Does not bode very well for the future of their "welfare" efforts.