Comment by xpct

8 hours ago

Well, I'd say it's a reasonable expectation for the model to behave similarly across releases. Am I wrong to assume that?

I imagine the system prompt can correct some training artifacts and drive abnormal behavior to the mean in the dimensions that Anthropic deems fit. So it's either that they are responding to their brittle training process, or that they chose this direction deliberately for a different reason.