Comment by adriand
6 hours ago
I find it interesting that new versions of, say, Claude will learn about the old version of Claude and what it did in the world and so on, on its next training run. Consider the situation with the Pentagon and Anthropic: Claude will learn about that on the next run. What conclusions will it draw? Presumably good ones, that fit with its constitution.
From this standpoint I wonder, when Anthropic makes decisions like this, if they take into account Claude as a stakeholder and what Claude will learn about their behaviour and relationship to it on the next training run.
> if they take into account Claude as a stakeholder and what Claude will learn about their behaviour and relationship to it on the next training run.
Oh they definitely do. If you pay attention in AI circles, you'll hear a lot of people talking about writing to the future Claudes. Not unlike those developers and writers who put little snippets in their blogs and news articles about who they are and how great they are, and then later the LLMs report that information back as truth. In this case, Anthropic is very interested in ensuring that Claude develops a cohesive personality by basically founding snippets of the personality within the corpus of training data, which is the broad internet and research papers.