Comment by ashdksnndck
11 hours ago
We’re working with the models that are available now, not theoretical future models with infinite context.
Claude is programmed to stop reading after it gets through the skill’s description. That means we don’t consume more tokens in the context until Claude decides it will be useful. This makes a big difference in practice. Working in a large repo, it’s an obvious step change between me needing to tell Claude to go read a particular readme that I know solves the problem vs Claude just knowing it exists because it already read the description.
Sure, if your project happened to already have a perfect index file with a one-sentence description of each other documentation file, that could serve as a similar purpose (if Claude knew about it). It’s worthwhile to spread knowledge about how effective this pattern is. Also, Claude is probably trained to handle this format specifically.
To clarify, the bit where I think the bitter lesson applies is trying to standardize the directory names, the permitted headings and paragraph lengths, etc. It's pointless bikeshedding.
Making your docs nice and modular, and having a high-level overview that tells you where to find more detailed info on specific topics, is definitely a good idea. We already know that when we're writing docs for human readers. The LLMs are already trained on a big corpus written by and for humans. There's no compelling reason why we need to do anything radically different to help them out. To the contrary, it's better not to do anything radically different, so that new LLM-assisted code and docs can be accessible to humans too.
Well-written docs already play nicely with LLM context.
I've never felt that generic, share-able skills were much of a useful thing. The model already knows the generic things!
On the other hand, the specific details of how to do things in YOUR software, in YOUR environment (especially if it's quirky) - that does beat asking the model to work it out from first principles each time.
I still prefer to rely on convention where possible - i.e, what's even better than writing a skill for say, "how to manage users in our ABC application, first call API a..." is to follow some widely established convention that means the obvious first thing the model would naturally try "just works".
Is your view that this doesn’t work based on conjecture or direct experience? It’s my understanding Anthropic and OpenAI have optimized their products to use skills more efficiently and it seems obviously true when I add skills to my repo (even when the info I put there is already in existing documentation).