Comment by minimaltom
13 hours ago
Between this, the emotions paper, golden gate claude etc, it doesn't seem like such a stretch that Anthropic are doing some kind of activation steering as part of training (and its part of their lead)
13 hours ago
Between this, the emotions paper, golden gate claude etc, it doesn't seem like such a stretch that Anthropic are doing some kind of activation steering as part of training (and its part of their lead)
it could be helpful in gettig their learnings to generalize from RL