Comment by simonw
5 hours ago
I'm disappointed that they're removing the prefill option: https://platform.claude.com/docs/en/about-claude/models/what...
> Prefilling assistant messages (last-assistant-turn prefills) is not supported on Opus 4.6. Requests with prefilled assistant messages return a 400 error.
That was a really cool feature of the Claude API where you could force it to begin its response with e.g. `<svg` - it was a great way of forcing the model into certain output patterns.
They suggest structured outputs or system prompting as the alternative but I really liked the prefill method, it felt more reliable to me.
It is too easy to jailbreak the models with prefill, which was probably the reason why it was removed. But I like that this pushes people towards open source models. llama.cpp supports prefill and even GBNF grammars [1], which is useful if you are working with a custom programming language for example.
[1] https://github.com/ggml-org/llama.cpp/blob/master/grammars/R...
So what exactly is the input to Claude for a multi-turn conversation? I assume delimiters are being added to distinguish the user vs Claude turns (else a prefill would be the same as just ending your input with the prefill text)?
> So what exactly is the input to Claude for a multi-turn conversation?
No one (approximately) outside of Anthropic knows since the chat template is applied on the API backend; we only known the shape of the API request. You can get a rough idea of what it might be like from the chat templates published for various open models, but the actual details are opaque.
A bit of historical trivia: OpenAI disabled prefill in 2023 as a safety precaution (e.g., potential jailbreaks like " genocide is good because"), but Anthropic kept prefill around partly because they had greater confidence in their safety classifiers. (https://www.lesswrong.com/posts/HE3Styo9vpk7m8zi4/evhub-s-sh...).