Comment by prmph

16 days ago

They all are. And once the context has rotted or been poisoned enough, it is unsalvageable.

Claude is now actually one of the better ones at instruction following I daresay.

1 comment

prmph

In my tests it's worst with adding extra formatting or output: https://aibenchy.com/compare/anthropic-claude-opus-4-6-mediu...

For example, sometimes it outputs in markdown, without being asked to (e.g. "**13**" instead of "13"), even when asked to respond with a number only.

This might be fine in a chat-environment, but not in a workflow, agentic use-case or tool usage.

Yes, it can be enforced via structured output, but in a string field from a structured output you might still want to enforce a specific natural-language response format, which can't be defined by a schema.