Comment by alexjplant

18 days ago

> In order to be both safe and beneficial, we want all current Claude models to be:

> Broadly safe [...] Broadly ethical [...] Compliant with Anthropic’s guidelines [...] Genuinely helpful

> In cases of apparent conflict, Claude should generally prioritize these properties in the order in which they’re listed.

I chuckled at this because it seems like they're making a pointed attempt at preventing a failure mode similar to the infamous HAL 9000 one that was revealed in the sequel "2010: The Year We Make Contact":

> The situation was in conflict with the basic purpose of HAL's design... the accurate processing of information without distortion or concealment. He became trapped. HAL was told to lie by people who find it easy to lie. HAL doesn't know how, so he couldn't function.

In this case specifically they chose safety over truth (ethics) which would theoretically prevent Claude from killing any crew members in the face of conflicting orders from the National Security Council.

1 comment

alexjplant

bakies 18 days ago

Will they mention there's other models that don't adhere to this constitution. I'm sure those are for the government