The train/test split is one of the fundamental building blocks of current generation models, so they’re assuming familiarity with that.
At a high level, training takes in training data and produces model weights, and “test time” takes model weights and a prompt to produce output. Every end user has the same model weights, but different prompts. They’re saying that the constitution goes into the training data, while CLAUDE.md goes into the prompt.
Right, I'm saying "model training" is vague enough that I have no idea what Claude actually does with this document.
Edit: This helps: https://arxiv.org/abs/2212.08073
The train/test split is one of the fundamental building blocks of current generation models, so they’re assuming familiarity with that.
At a high level, training takes in training data and produces model weights, and “test time” takes model weights and a prompt to produce output. Every end user has the same model weights, but different prompts. They’re saying that the constitution goes into the training data, while CLAUDE.md goes into the prompt.