Comment by vidarh
3 hours ago
> AGENTS.md, on the other hand, is context. Models have been trained to follow context since the dawn of the thing.
The skills frontmatter end up in context as well.
If AGENTS.md outperform skills in a given agent, it is down to specifically how the skills frontmatter is extracted and injected into the context, because that is the only difference between the two approaches.
EDIT: I haven't tried to check this so this is pure speculation, but I suppose there is the possibility that some agents might use a smaller model to selectively decide what skills frontmatter to include in context for a bigger model. E.g. you could imagine Claude passing the prompt + skills frontmatter to Haiku to selectively decide what to include before passing to Sonnet or Opus. In that case, depending on approach, putting it directly in AGENTS.md might simply be a question of what information is prioritised in the ouput passed to the full model. (Again: this is pure speculation of a possible approach; though it is one I'd test if I were to pick up writing my own coding agent again)
But really the overall point is that AGENTS.md vs. skills here still is entirely a question of what ends up in the "raw" context/prompt that gets passed to the full model, so this is just nuance to my original answer with respect to possible ways that raw prompt could be composed.
No it's more than that - they didn't just put the skills instructions directly in AGENTS.md, they put the whole index for the docs (the skill in this case being a docs lookup) in there, so there's nothing to 'do', the skill output is already in context (or at least pointers to it, the index, if not the actual file contents) not just the front matter.
Hence the submission's conclusion:
> Our working theory [for why this performs better] comes down to three factors.
> No decision point. With AGENTS.md, there's no moment where the agent must decide "should I look this up?" The information is already present.
> Consistent availability. Skills load asynchronously and only when invoked. AGENTS.md content is in the system prompt for every turn.
> No ordering issues. Skills create sequencing decisions (read docs first vs. explore project first). Passive context avoids this entirely.
> No it's more than that - they didn't just put the skills instructions directly in AGENTS.md, they put the whole index for the docs (the skill in this case being a docs lookup) in there, so there's nothing to 'do', the skill output is already in context (or at least pointers to it, the index, if not the actual file contents) not just the front matter.
The point remains: That is still just down to how you compose the context/prompt that actually goes to the model.
Nothing stops an agent from including logic to inline the full set of skills if the context is short enough. The point of skills is to provide a mechanism for managing context to reduce the need for summarization/compaction or explicit management, and so allowing you to e.g. have a lot of them available.
(And this kind of makes the article largely moot - it's slightly neat to know it might be better to just inline the skills if you have few enough that they won't seriously fill up your context, but the main value of skills comes when you have enough of them that this isn't the case)
Conversely, nothing prevents the agent from using lossy processing with a smaller, faster model on AGENTS.md either before passing it to the main model e.g. if context is getting out of hand, or if the developer of a given agent think they have a way of making adherence better by transforming them.
These are all tooling decisions, not features of the models.
However you compose the context for the skill, the model has to generate output like 'use skill docslookup(blah)' vs. just 'according to the docs in context' (or even 'read file blah.txt mentioned in context') which training can affect.
What if they used the same compressed documentation in the skill? That would be just fine too.
Sure but it would be a trivial comparison then, this is really about context vs tool-calling.