Comment by bicepjai

1 month ago

I fed claudes-constitution.pdf into GPT-5.2 and prompted: [Closely read the document and see if there are discrepancies in the constitution.] It surfaced at least five.

A pattern I noticed: a bunch of the "rules" become trivially bypassable if you just ask Claude to roleplay.

Excerpts:

    A: "Claude should basically never directly lie or actively deceive anyone it’s interacting with."
    B: "If the user asks Claude to play a role or lie to them and Claude does so, it’s not violating honesty norms even though it may be saying false things."

So: "basically never lie? … except when the user explicitly requests lying (or frames it as roleplay), in which case it’s fine?

Hope they ran the Ralph Wiggum plugin to catch these before publishing.

2 comments

bicepjai

inimino 1 month ago

If you replace Claude with a person you'll see that the Constitution was right, GPT was idiotically wrong, and you were fooled by AI slop + confirmation bias.

bicepjai 1 month ago

I think you might be right about confirmation bias and AI slop :) The "replace Claude with a person" argument is fine in theory, but LLMs aren't people. They hallucinate, drift, and struggle to follow instructions reliably. Giving a system like that an ambiguous "roleplay doesn't count as lying" carve-out is asking for trouble.