Comment by guitarsteve
3 years ago
I’ve often heard this (YAML is a superset of JSON) but never looked into the details.
According to https://yaml.org/spec/1.2.2/, YAML 1.2 (from 2009) is a strict superset of JSON. Earlier versions were an _almost_ superset. Hence the confusion in this thread. It depends on the version…
CPAN link provided by the parent says 1.2 still isn't a superset:
> Addendum/2009: the YAML 1.2 spec is still incompatible with JSON, even though the incompatibilities have been documented (and are known to Brian) for many years and the spec makes explicit claims that YAML is a superset of JSON. It would be so easy to fix, but apparently, bullying people and corrupting userdata is so much easier.
Are these documented YAML 1.2 JSON incompatibilities listed / linked to somewhere?
I assume these are something related to non-ascii string encoding / escapes?
They are listed in that same CPAN link
"Please note that YAML has hardcoded limits on (simple) object key lengths that JSON doesn't have and also has different and incompatible unicode character escape syntax... YAML also does not allow \/ sequences in strings"
The JSON::XS documentation linked above reports that YAML 1.2 is not a strict superset of JSON:
> Addendum/2009: the YAML 1.2 spec is still incompatible with JSON
The author also details their issues in, ah, getting some of the authors of the YAML specification to agree.
I just checked YAML 1.2 and it seems that 1024 limit length on keys still in spec (https://yaml.org/spec/1.2.2/, ctrl+f, 1024). So any JSON with long keys is not compatible with YAML.
The JSON specification [1] states:
> An implementation may set limits on the length and character contents of strings.
So this length limit is not a source of incompatibility with JSON.
[1] https://datatracker.ietf.org/doc/html/rfc7159#section-9
Wow! That makes it pretty hard to know you've generated useful JSON especially if your goal is to for cross-ecosystem communication.
To be fair, any JSON implentation is going to have a practical limit on the key size, it's just a bit more random and harder to figure out :)
If you mean limited by available memory, then sure but that does not apply just to key size. If you mean something else, could you elaborate?
5 replies →
1024 limit is for unquoted keys, which do not occur in JSON
Have a closer look. The 1024 limit in version 1.2 is only for implicit block mapping keys, not for flow style `{"foo": "bar"}`
In the beginning was the SGML.
Then we said it's too verbose. We named some subsets XML, HTML, XLSX.
Then we said it's still too long. So we named some subsets Markdown, and YML.
Then we said it's still too long, and made JSON.
What's wrong with subsets? Ambiguity in naming things.
https://news.ycombinator.com/item?id=26671136
> Then we said it's too verbose. We named some subsets XML, HTML, XSLX
If anything, XML as an SGML subset is more verbose than SGML proper; in fact, getting rid of markup declarations to yield canonical markup without omitted/inferred tags, shortforms, etc. was the entire point of XML. Of course, XML suffered as an authoring format due to verbosity, which led to the Cambrian explosion of Wiki languages (MediaWiki, Markdown, etc.).
Also, HTML was conceived as an SGML vocabulary/application [1], and for the most part still is [2] (save for mechanisms to smuggle CSS and JavaScript into HTML without the installed base of browsers displaying these as content at the time, plus HTML5's ad-hoc error recovery).
[1]: http://info.cern.ch/hypertext/WWW/MarkUp/MarkUp.html
[2]: http://sgmljs.net/docs/html5.html
Well, Markdown and YML and JSON are not subsets of SGML, nobody claims they are, and nobody intented them as such. So there's that.
While indeed neither markdown, much less JSON syntax has been intended as an SGML app, that doesn't stop SGML from parsing JSON, markdown, and other custom Wiki syntax using SHORTREF [1] ;) In fact, the original markdown language is specified as a mapping to HTML angle-bracket markup (with HTML also an SGML vocabulary), and thus it's quite natural to express that mapping using SGML SHORTREF, even though only a subset can be expressed.
[1]: https://www.balisage.net/Proceedings/vol17/html/Walsh01/Bali...
[2]: https://daringfireball.net/projects/markdown/
First they came for the angle brackets. And I did not speak out. Because I did not use XML...
You didn't use XML? But We use XML to read the comments here on this HTML web page.
But I came for the angle brackets. Because I < We, eternally.
> Then we said it's still too long. So we named some subsets Markdown, and YML.
> Then we said it's still too long, and made JSON.
JSON is older than markdown and yaml.
Thank you for correcting history! I'd forgotten >_<
I think you'll find that in the beginning were M-expressions, but they were evil, and were followed by S-expressions, which were and are and ever will be good.
SGML and its descendants are okay for document markup.
XML for data (as opposed to markup) is either evil or clown-shoes-for-a-hat insane — I can’t figure out which.
JSON is simultaneously under- and over-specified, leading to systems where everything works right up until it doesn't. It shares a lot with C and Unix in this respect.
If XML for data is bad, check out XML as a programming language. I think this has cropped up a few times, one that stuck with me was as templating structures in the FutureTense app server, before being acquired by OpenMarket and they switched to JSPs or something.
Lots of <for something> <other stuff> </for> sorts of evil.
note: HTML5 is not a subset of SGML.