Comment by stingraycharles
16 hours ago
Underlying data changes all the time, as do training methodologies / preferences.
You do realize that these LLMs are trained with a metric ton of synthetic examples? You describe the kind of examples / behavior you want, let it generate thousands of examples of this behavior (positive and negative), and you feed that to the training process.
So changing this type of data is cheap to change, and often not even stored (one LLM is generating examples while the other is training in real-time).
Here's a decent collection of papers on the topic: https://github.com/pengr/LLM-Synthetic-Data
Well, I'd say it's a reasonable expectation for the model to behave similarly across releases. Am I wrong to assume that?
I imagine the system prompt can correct some training artifacts and drive abnormal behavior to the mean in the dimensions that Anthropic deems fit. So it's either that they are responding to their brittle training process, or that they chose this direction deliberately for a different reason.