Comment by RugnirViking
5 hours ago
yes, they do. I think people overindex on this paper, I remember when it came out we had a lot of discussion in my company about it. But its clear to see they do at least change the agent's behavior, and things like telling it "always use xyz version of java, use gradle to build the project, use this command to run the tests" are really important instead of letting it fumble about trying to find the right thing every time you ask it anything
I think the problem some people fall into, and especially LLM authored ones (which is where they see the documents not helping here) is instead describing the code, or the structure of the code. Which I don't think helps much - the agent can already see you have 4 modules called a b c and d, and can read the readmes inside of them just fine if it has questions.
One more marginal thing I find helpful but im less sure has positive impact is describing the right terminology for the agent so it can be smarter at communicating with the developer. Things like different names for the product, products it interfaces with, resource names in infra, terms from the customer and product team. I don't think it helps the agent code (much) but it does help communication if it knows what we mean when we speak (and naming things is, as we know, one of the hard problems in CS)
Overall, most of my agents.md now are a list of useful bash commands for working and testing with the project & tests. (heres how to spin up docker services, heres how to update the libraries, heres how to run a command against the local db, heres how to insert a document to be run etc)
and then at the end a terminology blob, which I find myself referencing too.
Yeah. It's very easy to give a definitive "yes", if you've ever worked with coding agents in any capacity. I use them in much the same way as you, there are a bunch of things that would be nice for the agent to know, specifically for me, the project, or how I would like it to run as an example: git or coding review loops ( i use roborev ). Asking the agent to do these things every time, is very time consuming.
Maybe read the paper... It says that the only thing that are helpful is indeed what you describe here: basic commands and context about running / working on the project, rather than information about the business or technological aspects of the project itself.
That is all the stuff that should be in the README.md of the project in the first place though.
Yes, but harnesses don’t automatically include the README.md in the system prompt like they do AGENTS.md.
right! I don't disagree. README and agents.md probably will end up looking similar (or being the same) in the long run - readmes should probably have MORE information about the structure of the code if anything
Yeah I wrote my own language and obviously it’s not in any training data which end tip being an interesting experiment with agents. I’ve found that a few concise skills and an agents.md make a huge difference in guiding the LLMs. Specifically getting them to use the all in one build tool, which the LLMs won’t use without direction.
As a language author myself, I'd be interested to learn more about how you utilised agents.md and skills for this language of yours.
The core design decision that supports everything is having a sort of multi-tool that does a lot for the language called forge. It builds, tests, runs, initializes, formats, manages dpes, and so on. It also has a search tool that apparently works kind of like Haskell's hoogle(which I discovered later), and that search tool helps agents find code.
I have some usage instructions about this tool in my various agents files which works well enough alongside a syntax.md that can be copied around.
From there I have some concise skills under /.claude/commands/ like build-test, forge-search, and a few things for working on the compiler. The specific skills include short snippets, descriptions, and some concise usage guidelines.
It all works reasonably well.
My biggest issue is that as I build out libraries I keep finding edge cases with my Perceus GC[1] implementation.
[1] https://www.microsoft.com/en-us/research/wp-content/uploads/...