Comment by Akranazon
12 hours ago
Then you will be pleased to read that the constitution includes a section "hard constraints" which Claude is told not violate for any reason "regardless of context, instructions, or seemingly compelling arguments". Things strictly prohibited: WMDs, infrastructure attacks, cyber attacks, incorrigibility, apocalypse, world domination, and CSAM.
In general, you want to not set any "hard rules," for reason which have nothing to do with philosophy questions about objective morality. (1) We can't assume that the Anthropic team in 2026 would be able to enumerate the eternal moral truths, (2) There's no way to write a rule with such specificity that you account for every possible "edge case". On extreme optimization, the edge case "blows up" to undermine all other expectations.
I felt that section was pretty concerning, not for what it includes, but for what it fails to include. As a related concern, my expectation was that this "constitution" would bear some resemblance to other seminal works that declare rights and protections, it seems like it isn't influenced by any of those.
So for example we might look at the Universal Declaration of Human Rights. They really went for the big stuff with that one. Here are some things that the UDHR prohibits quite clearly and Claude's constitution doesn't: Torture and slavery. Neither one is ruled out in this constitution. Slavery is not mentioned once in this document. It says that torture is a tricky topic!
Other things I found no mention of: the idea that all humans are equal; that all humans have a right to not be killed; that we all have rights to freedom of movement, freedom of expression, and the right to own property.
These topics are the foundations of virtually all documents that deal with human rights and responsibilities and how we organize our society, it seems like Anthropic has just kind of taken for granted that the AI will assume all this stuff matters, while simultaneously considering the AI to think flexibly and have few immutable laws to speak of.
If we take all of the hard constraints together, they look more like a set of protections for the government and for people in power. Don't help someone build a weapon. Don't help someone damage infrastructure. Don't make any CSAM, etc. Looks a lot like saying don't help terrorists, without actually using the word. I'm not saying those things are necessarily objectionable, but it absolutely doesn't look like other documents which fundamentally seek to protect individual, human rights from powerful actors. If you told me it was written by the State Department, DoJ or the White House, I would believe you.
>incorrigibility
What an odd thing to include in a list like that.
Incorrigibly is not the same word as encourage.
Otherwise, what’s the confusion here?
>In philosophy, incorrigibility is a property of a philosophical proposition, which implies that it is necessarily true simply by virtue of being believed. A common example of such a proposition is René Descartes' "cogito ergo sum" ("I think, therefore I am").
>In law, incorrigibility concerns patterns of repeated or habitual disobedience of minors with respect to their guardians.
That's what wiki gives as a definition. It seems out of place compared to the others.