Comment by ben_w

18 days ago

While true, irrelevant.

This isn't Anthropic PBC's constitution, it's Claude's constitution. The models themselves, not the company, for the purpose of training the models' behaviours and aligning them with the behaviours that the company wants the models to demonstrate and to avoid.

2 comments

ben_w

adangert 17 days ago

Conway's law seems apt here. The behavior of Claude will mirror the behavior and structure of anthropic. If anthropic deems one revenue source higher than another, Claude's behavior will optimize towards that regardless of what was published here.

What a company or employee "wants" and how a company is funded are usually diametrically opposed, the latter always taking precedence. Don't be evil!

ben_w 17 days ago
Yes, but that is a different level of issue. To analogise in two different ways, first it's like, sure, Microsoft can be ordered by the US government to spy on people and to backdoor crypto. Absolutely, 100%, and most world governments are probably now asking themselves what to do about that. But what you said was kinda like someone saying of Microsoft:
In the long run autocratic governments spying on their citizens will backdoor all crypto (Microsoft will probably concede to such an order in no time flat), which is conveniently left out in this "unit test". Mostly a waste of effort on their part.
Or if that doesn't suit you: yes, sure, there's a large flashing sign on the motorway warning of an accident 50 miles ahead of you, and if you do nothing this will absolutely cause you problems, but that doesn't make the lane markings you're currently following a "waste of effort".
Also, as published work, they're showing everyone else, including open weights providers, things which may benefit us with those models.
Unfortunately, I say "may" rather than "will", because if you put in a different constitution you could almost certainly get a model that has the AI equivalent of a "moral compass" tuned to supports anything from anarchy to totalitarianism, from mafia to self-policing, and similarly for all the other axes people care about. With a separate version of the totalitarianism/mafia/etc variants for each specific group that wants to seek power, c.f. how Grok was saying Musk is best at everything no matter how non-sensical the comparison was.
But that's also a different question. The original alignment problem is "at all", which we seem to be making progress with; once we've properly solved "at all" then we have the ability to experience the problem of "aligned with whom?"