← Back to context

Comment by redbell

2 days ago

> But <output>? Most have never touched it. Some don’t even know it exists.

Yeah, count me on with those who don't even know it exists. I'm adding this to my TIL.

> When I searched GitHub public repos, it barely showed up at all.

> That absence creates a feedback loop: if no one teaches it, no one uses it.

This has triggered an instant question in my head: Do LLMs actually use it when generating code or they are not well-trained for this specific tag?

I, too, am concerned about AIs not reading the docs. What happens when a new W3C spec comes out and most people are vibe coding? If AIs don't take current specs into account and just regurgitate old code patterns, then disseminating spec updates or new specs will be harder than it already is.

  • Most people don't care about W3C specs as it is, nevermind with vibe coding. The React release notes are the important web standards they follow.

  • Alternate problem. I was trying to fix up an old repo I found which used some pdf tool, but when trying an LLM it insisted on reading all the documentation, but the docs were woefully out of date and didn't match the actually binaries, so it got terribly twisted up.

  • Yeah llms don’t read docs. They repeat the info in docs. And swap letters around the code to make it fit.

LLMs generate code based on statistical patterns found in vast amounts of training data from existing projects, not by reading language specifications. If the tag is rare in the wild, it will be rare in their output.

  • LLMs also don't know about new MCP tools when they are training but they use them perfectly fine when presented with the information about them.

    AI software development models and agents can be "taught" to go look at the official version documentation for languages via the prompt (just as one example) without needing to modify their weights.

    One call to getSpecification('Python','3.14') or some similar tool call and they know exactly what they are working with, even if the language version did not exist when the model was trained.

  • I mean, they're trained on specs, too. I'll have to play with asking for semantic HTML and see what they come up with.