Comment by cowlby
17 hours ago
I'm fascinated that Anthropic employees, who are supposed to be the LLM experts, are using tricks like these which go against how LLMs seem to work.
Key example for me was the "malware" tool call section that included a snippet with intent "if it's malware, refuse to edit the file". Yet because it appears dozens of times in a convo, eventually the LLM gets confused and will refuse to edit a file that is not malware.
I've resorted to using tweakcc to patch many of these well-intentioned sections and re-work them to avoid LLM pitfalls.
These aren't as much tricks as just one layer of defense. But prompting is useless, as you can use the API directly without these prompts.
I run claude code with my own system prompt and toolings on top of it. tweakcc broke too often and had too many glitches.
They aren’t necessarily experts at using Llm’s. They have different incentives as well
Was that an Anthropic issue, or a gpt-oss problem?