Comment by jerf
3 years ago
YAML was intended to be a superset, but it isn't quite, which is about the worst case scenario. See https://metacpan.org/pod/JSON::XS#JSON-and-YAML , for instance.
(I am an absolutist on this matter. To be a superset, all, that's A L L valid JSON strings must also be valid YAML to be a superset. A single failure makes it not a superset. At scale, any difference will eventually occur, which is why even small deviations matter.)
I’ve often heard this (YAML is a superset of JSON) but never looked into the details.
According to https://yaml.org/spec/1.2.2/, YAML 1.2 (from 2009) is a strict superset of JSON. Earlier versions were an _almost_ superset. Hence the confusion in this thread. It depends on the version…
CPAN link provided by the parent says 1.2 still isn't a superset:
> Addendum/2009: the YAML 1.2 spec is still incompatible with JSON, even though the incompatibilities have been documented (and are known to Brian) for many years and the spec makes explicit claims that YAML is a superset of JSON. It would be so easy to fix, but apparently, bullying people and corrupting userdata is so much easier.
Are these documented YAML 1.2 JSON incompatibilities listed / linked to somewhere?
I assume these are something related to non-ascii string encoding / escapes?
1 reply →
The JSON::XS documentation linked above reports that YAML 1.2 is not a strict superset of JSON:
> Addendum/2009: the YAML 1.2 spec is still incompatible with JSON
The author also details their issues in, ah, getting some of the authors of the YAML specification to agree.
I just checked YAML 1.2 and it seems that 1024 limit length on keys still in spec (https://yaml.org/spec/1.2.2/, ctrl+f, 1024). So any JSON with long keys is not compatible with YAML.
The JSON specification [1] states:
> An implementation may set limits on the length and character contents of strings.
So this length limit is not a source of incompatibility with JSON.
[1] https://datatracker.ietf.org/doc/html/rfc7159#section-9
1 reply →
To be fair, any JSON implentation is going to have a practical limit on the key size, it's just a bit more random and harder to figure out :)
6 replies →
1024 limit is for unquoted keys, which do not occur in JSON
Have a closer look. The 1024 limit in version 1.2 is only for implicit block mapping keys, not for flow style `{"foo": "bar"}`
In the beginning was the SGML.
Then we said it's too verbose. We named some subsets XML, HTML, XLSX.
Then we said it's still too long. So we named some subsets Markdown, and YML.
Then we said it's still too long, and made JSON.
What's wrong with subsets? Ambiguity in naming things.
https://news.ycombinator.com/item?id=26671136
> Then we said it's too verbose. We named some subsets XML, HTML, XSLX
If anything, XML as an SGML subset is more verbose than SGML proper; in fact, getting rid of markup declarations to yield canonical markup without omitted/inferred tags, shortforms, etc. was the entire point of XML. Of course, XML suffered as an authoring format due to verbosity, which led to the Cambrian explosion of Wiki languages (MediaWiki, Markdown, etc.).
Also, HTML was conceived as an SGML vocabulary/application [1], and for the most part still is [2] (save for mechanisms to smuggle CSS and JavaScript into HTML without the installed base of browsers displaying these as content at the time, plus HTML5's ad-hoc error recovery).
[1]: http://info.cern.ch/hypertext/WWW/MarkUp/MarkUp.html
[2]: http://sgmljs.net/docs/html5.html
1 reply →
Well, Markdown and YML and JSON are not subsets of SGML, nobody claims they are, and nobody intented them as such. So there's that.
1 reply →
First they came for the angle brackets. And I did not speak out. Because I did not use XML...
1 reply →
> Then we said it's still too long. So we named some subsets Markdown, and YML.
> Then we said it's still too long, and made JSON.
JSON is older than markdown and yaml.
1 reply →
I think you'll find that in the beginning were M-expressions, but they were evil, and were followed by S-expressions, which were and are and ever will be good.
SGML and its descendants are okay for document markup.
XML for data (as opposed to markup) is either evil or clown-shoes-for-a-hat insane — I can’t figure out which.
JSON is simultaneously under- and over-specified, leading to systems where everything works right up until it doesn't. It shares a lot with C and Unix in this respect.
1 reply →
note: HTML5 is not a subset of SGML.
For example, this valid JSON doesn't parse as YAML:
(tested on Python)
edit: whitespace didn't quite make it through HN, here:
Python's .netrc library also hasn't supported comments correctly for like 5 years. The bug was reported, it was never fixed. If I want to use my .netrc file with Python programs, I have to remove all comments (that work with every other .netrc-using program).
It's 2022 and we can't even get a plaintext configuration format from 1980 right.
> It's 2022 and we can't even get a plaintext configuration format from 1980 right.
To me, it's more depressing that we've been at this for 50-60 years and still seemingly don't have an unambiguously good plaintext configuration format at all.
51 replies →
Hmm, it looks like it’s handled comments for at least a decade:
https://github.com/python/cpython/blame/d75a51bea3c2442f81d3...
Oh, maybe it’s this issue:
https://bugs.python.org/issue34132
If I’ve read it correctly, there was a regression from Python 2.x to 3.x such that you now need to format comments:
Instead of:
(A space after the # isn’t accepted by the parser.)
Great idea.
increases the chances it will work
A note to readers: it's not always a good idea to put automated software installation in a place that users don't expect it.
I've seen that kind of approach cause a ton of issues the moment that the software was used in a different environment than the author expected.
It's much better IMO to fail with a message about how to install the missing dependency.
1 reply →
Brilliant! What license is this published under?
1 reply →
Tested on python what? I was curious to see what error that produced, figuring it would be some whitespace due to the difference between the list items, but using the yamlized python that I had lying around, it did the sane thing:
produces
With leading tabs it does not work.
7 replies →
Edited with string escapes, the tab didn't make it through HN.
The error from PyYaml 5.3.1:
6 replies →
This parses fine as YAML in all the tools I've tried. Can you provide the specific versions of the libraries you're using?
They should remove the phrase "every JSON file is also a valid YAML file" from the YAML spec. 1) it isn't true, and 2) it seems like it goes against the implication made here:
> This makes it easy to migrate from JSON to YAML if/when the additional features are required.
If JSON interop is provided solely as a short-term solution that eases the transition to YAML, then I applaud the YAML designers for making a great choice.
> YAML was intended to be a superset
My impression was JSON came years after YAML, and it was somehow coincidental that YAML was almost a superset of JSON.
(Shockingly wikipedia tells me they both came out within a month of each other in 2001).
On the upside, if it's almost a superset then a data producer can make sure it is polyglot by sticking to the intersection of the two.
C++ is not a strict superset of C, but the ability to include C headers is very valuable.
I wasn't able to reproduce any of the issues listed on that page. Does anyone have an example?
I'm not a fan of YAML either, but I think you should not generate YAML files if you can avoid it. All YAML you encounter should be hand-written, so this problem should not occur.
I read "YAML is a superset of JSON" not as a logical statement, but as instructions to humans writing YAML. If you know JSON, you can use that syntax to write YAML. Just like, if you know JavaScript or Python (or to some extent PHP) object syntax, you can write JSON.
If you get a parse error, no biggie, you Alt+Tab to the editor where you are editing the config file and correct it. It is not like you are serving this over the net to some other program.
Same applies to TypeScript. It is not a superset of JavaScript, although many people think it is.
https://stackoverflow.com/a/53698835/
As long as you tell the typescript compiler not to stop when it finds type problems, all JavaScript works and compiles, right? That sounds like a superset to me. Syntactically there are no problems, and the error messages are just messages.
> As long as you tell the typescript compiler not to stop when it finds type problems, all JavaScript works and compiles, right?
Does such code count as valid TypeScript though? It sounds more as if the compiler has an option to accept certain invalid programs.
You could build a C++ compiler with a flag to warn, rather than error, on encountering implicit conversions that are forbidden by the C++ standard. The language the compiler is accepting would then no longer be standard C++, but a superset. (Same for all compiler-specific extensions of course.)
Personally I'm inclined to agree with this StackOverflow comment. [0] It's an interesting edge-case though.
[0] https://stackoverflow.com/questions/29918324/is-typescript-r...
1 reply →
congrats to all involved for sticking to their guns here. specs exist for a reason :D