Comment by alex_metacraft

15 days ago

This is a really interesting finding. It makes sense when you think about what the training data looks like — first person statements in a system prompt pattern-match to "internal monologue" or "chain of thought" examples, which the model has been heavily trained to follow through on. Second person commands pattern-match to user instructions, which the model has also been trained to sometimes push back on or reinterpret.

There's probably a related effect with imperative vs. declarative framing in skills too. "When the user asks about X, do Y" seems to work worse than "This project uses Y for X" in my experience. The declarative version reads like a fact about the world rather than a command to obey, and models seem to treat facts as more reliable context.

Would be curious if someone has tested this systematically across different models. The optimal framing might vary quite a bit between Claude, Gemini, and GPT.