Comment by pacoWebConsult

1 month ago

Models each have their own, often competing, quirks on how they utilize AGENTS.md and CLAUDE.md. It's very likely a CLAUDE.md written for use with Claude Code utilizes prompting techniques that results in worse output if taken directly and used with Codex. For example, Anthropic recommends putting info that an agent must adhere to in statements like "MUST run tests after writing code" and other all-caps directives, whereas people have found using the same language with GPT-5.2 results in less instruction following, more timid responses than if the AGENTS.md were written without them.

1 comment

pacoWebConsult

bloppe 1 month ago

In my experience, even a version upgrade of the same model will tend to break many assumptions about its quirks, so most people don't have time to try to optimize for them anyway. This is the wrong technology if you're that concerned about reliability.