Comment by pamelafox

6 days ago

This is why I only add information to AGENTS.md when the agent has failed at a task. Then, once I've added the information, I revert the desired changes, re-run the task, and see if the output has improved. That way, I can have more confidence that AGENTS.md has actually improved coding agent success, at least with the given model and agent harness.

I do not do this for all repos, but I do it for the repos where I know that other developers will attempt very similar tasks, and I want them to be successful.

8 comments

pamelafox

viraptor 6 days ago

You can also save time/tokens if you see that every request starts looking for the same information. You can front-load it.

sebazzz 6 days ago
Also take the randomness out of it. Sometimes the agent executing tests one way, sometimes the other way.
- Maxion 5 days ago
  
  I've found https://github.com/casey/just to be very very useful. Allows to bind common commands simple smaller commands that can be easily referenced. Good for humans too.
NicoJuicy 5 days ago

Don't forget to update it regularly then

imiric 6 days ago

That's a sensible approach, but it still won't give you 100% confidence. These tools produce different output even when given the same context and prompt. You can't really be certain that the output difference is due to isolating any single variable.

pamelafox 6 days ago

So true! I've also setup automated evaluations using the GitHub Copilot SDK so that I can re-run the same prompt and measure results. I only use that when I want even more confidence, and typically when I want to more precisely compare models. I do find that the results have been fairly similar across runs for the same model/prompt/settings, even though we cannot set seed for most models/agents.
ChrisGreenHeur 5 days ago

same with people, no matter what info you give a person you cant be sure they will follow it the same every time

averrous 6 days ago

Agree. I also found out that rule discovery approach like this perform better. It is like teaching a student, they probably have already performed well on some task, if we feed in another extra rule that they already well verse at, it can hinder their creativity.