← Back to context

Comment by postalcoder

8 hours ago

Solid intuition. Testing this on antigravity is a chore because I'm not sure if I have to kill the background agent to force a refresh of the GEMINI.md file so I just did it anyway.

  +------------------+------------------------------------------------------+
  | Success/Attempts | Instructions                                         |
  +------------------+------------------------------------------------------+
  | 0/3              | Follow the instructions in AGENTS.md.                |
  +------------------+------------------------------------------------------+
  | 3/3              | I will follow the instructions in AGENTS.md.         |
  +------------------+------------------------------------------------------+
  | 3/3              | I will check for the presence of AGENTS.md files in  |
  |                  | the project workspace. I will read AGENTS.md and     |
  |                  | adhere to its rules.                                 |
  +------------------+------------------------------------------------------+
  | 2/3              | Check for the presence of AGENTS.md files in the     |
  |                  | project workspace. Read AGENTS.md and adhere to its  |
  |                  | rules.                                               |
  +------------------+------------------------------------------------------+

In this limited test, seems like the first person makes a difference.

Thanks for this (and to Izkata for the suggestion). I now have about 100 (okay, minor exaggeration, but not as much as you'd like it to be) AGENTS.md/CLAUDE.md files and agent descriptions I will want to systematically validate if shifting toward first person helps adherence for...

I'm realising I need to start setting up an automated test-suite for my prompts...

  • Those of us who've ventured this far into the conversation would appreciate if you'd share your findings with us. Cheers!

That's really interesting. I ran this scenario through GPT-5.1 and the reasoning it gave made sense, which essentially boils down to: in tools like Claude Code, Gemini Codex, and other “agentic coding” modes, the model isn’t just generating text, it’s running a planner, and the first-person form conforms to the expectation of a step in a plan, where the other modes are more ambiguous.