Comment by puppycodes

1 day ago

So... this would be fine with them?

Claude: "Are you sure you want me to commit murder?"

User: "Yes"

Or do you mean Human presses button:

Claude: "Do you to commit murder? If so press the button."

User: "I pressed the button"

Claude: "Great! Now lets summarize what we did."

First one

  • Seems like an absurd distinction to me... Reminds me of "I was just following orders"...

    • I mean the distinction doesn't really matter

      There are many ways to construct HITL UXes. But typically they'd take the form of the first one

      I think you're missing the forest for the trees. All Anthropic is saying is that HITL is required before murder, the UX is irrelevant

      2 replies →