Comment by prmph
16 days ago
They all are. And once the context has rotted or been poisoned enough, it is unsalvageable.
Claude is now actually one of the better ones at instruction following I daresay.
16 days ago
They all are. And once the context has rotted or been poisoned enough, it is unsalvageable.
Claude is now actually one of the better ones at instruction following I daresay.
In my tests it's worst with adding extra formatting or output: https://aibenchy.com/compare/anthropic-claude-opus-4-6-mediu...
For example, sometimes it outputs in markdown, without being asked to (e.g. "**13**" instead of "13"), even when asked to respond with a number only.
This might be fine in a chat-environment, but not in a workflow, agentic use-case or tool usage.
Yes, it can be enforced via structured output, but in a string field from a structured output you might still want to enforce a specific natural-language response format, which can't be defined by a schema.