Comment by jerf

3 years ago

YAML was intended to be a superset, but it isn't quite, which is about the worst case scenario. See https://metacpan.org/pod/JSON::XS#JSON-and-YAML , for instance.

(I am an absolutist on this matter. To be a superset, all, that's A L L valid JSON strings must also be valid YAML to be a superset. A single failure makes it not a superset. At scale, any difference will eventually occur, which is why even small deviations matter.)

I’ve often heard this (YAML is a superset of JSON) but never looked into the details.

According to https://yaml.org/spec/1.2.2/, YAML 1.2 (from 2009) is a strict superset of JSON. Earlier versions were an _almost_ superset. Hence the confusion in this thread. It depends on the version…

  • CPAN link provided by the parent says 1.2 still isn't a superset:

    > Addendum/2009: the YAML 1.2 spec is still incompatible with JSON, even though the incompatibilities have been documented (and are known to Brian) for many years and the spec makes explicit claims that YAML is a superset of JSON. It would be so easy to fix, but apparently, bullying people and corrupting userdata is so much easier.

    • Are these documented YAML 1.2 JSON incompatibilities listed / linked to somewhere?

      I assume these are something related to non-ascii string encoding / escapes?

      1 reply →

  • The JSON::XS documentation linked above reports that YAML 1.2 is not a strict superset of JSON:

    > Addendum/2009: the YAML 1.2 spec is still incompatible with JSON

    The author also details their issues in, ah, getting some of the authors of the YAML specification to agree.

  • I just checked YAML 1.2 and it seems that 1024 limit length on keys still in spec (https://yaml.org/spec/1.2.2/, ctrl+f, 1024). So any JSON with long keys is not compatible with YAML.

  • In the beginning was the SGML.

    Then we said it's too verbose. We named some subsets XML, HTML, XLSX.

    Then we said it's still too long. So we named some subsets Markdown, and YML.

    Then we said it's still too long, and made JSON.

    What's wrong with subsets? Ambiguity in naming things.

    https://news.ycombinator.com/item?id=26671136

    • > Then we said it's too verbose. We named some subsets XML, HTML, XSLX

      If anything, XML as an SGML subset is more verbose than SGML proper; in fact, getting rid of markup declarations to yield canonical markup without omitted/inferred tags, shortforms, etc. was the entire point of XML. Of course, XML suffered as an authoring format due to verbosity, which led to the Cambrian explosion of Wiki languages (MediaWiki, Markdown, etc.).

      Also, HTML was conceived as an SGML vocabulary/application [1], and for the most part still is [2] (save for mechanisms to smuggle CSS and JavaScript into HTML without the installed base of browsers displaying these as content at the time, plus HTML5's ad-hoc error recovery).

      [1]: http://info.cern.ch/hypertext/WWW/MarkUp/MarkUp.html

      [2]: http://sgmljs.net/docs/html5.html

      1 reply →

    • Well, Markdown and YML and JSON are not subsets of SGML, nobody claims they are, and nobody intented them as such. So there's that.

      1 reply →

    • > Then we said it's still too long. So we named some subsets Markdown, and YML.

      > Then we said it's still too long, and made JSON.

      JSON is older than markdown and yaml.

      1 reply →

    • I think you'll find that in the beginning were M-expressions, but they were evil, and were followed by S-expressions, which were and are and ever will be good.

      SGML and its descendants are okay for document markup.

      XML for data (as opposed to markup) is either evil or clown-shoes-for-a-hat insane — I can’t figure out which.

      JSON is simultaneously under- and over-specified, leading to systems where everything works right up until it doesn't. It shares a lot with C and Unix in this respect.

      1 reply →

For example, this valid JSON doesn't parse as YAML:

    {
        "list": [
            {},
                {}
        ]
    }

(tested on Python)

edit: whitespace didn't quite make it through HN, here:

    json.loads('{\n  "list": [\n    {},\n\t{}\n    ]\n}')
    yaml.load ('{\n  "list": [\n    {},\n\t{}\n    ]\n}')

  • Python's .netrc library also hasn't supported comments correctly for like 5 years. The bug was reported, it was never fixed. If I want to use my .netrc file with Python programs, I have to remove all comments (that work with every other .netrc-using program).

    It's 2022 and we can't even get a plaintext configuration format from 1980 right.

  •     try:
            try:
                import orjson as json
            except:
                try:
                    import rapidjson as json
                except:
                    try:
                        import fast_json as json
                    except:
                        import json
            foo = json.loads(string)
        except:
            try:
                import yaml
            except:
                # try harder
                import os
                try:
                    assert(os.system("pip3 install yaml") == 0)
                except:
                    # try even harder
                    try:
                        assert(os.system("sudo apt install python3-pip && pip3 install yaml") == 0)
                    except:
                        assert(os.system("sudo yum install python3-pip && pip3 install yaml") == 0)
                import yaml
            try:
                foo = yaml.loads(string)
            except:
                try:
                    ....

    • A note to readers: it's not always a good idea to put automated software installation in a place that users don't expect it.

      I've seen that kind of approach cause a ton of issues the moment that the software was used in a different environment than the author expected.

      It's much better IMO to fail with a message about how to install the missing dependency.

      1 reply →

  • Tested on python what? I was curious to see what error that produced, figuring it would be some whitespace due to the difference between the list items, but using the yamlized python that I had lying around, it did the sane thing:

        PATH=$HOMEBREW_PREFIX/opt/ansible/libexec/bin:$PATH
        pip list | grep -i yaml
        python -V
        python <<'DOIT'
        from io import StringIO
        import yaml
        print(yaml.safe_load(StringIO(
        '''
            {
                "list": [
                    {},
                        {}
                ]
            }
        ''')))
        DOIT
    

    produces

        PyYAML                6.0
        Python 3.10.1
        {'list': [{}, {}]}

    • With leading tabs it does not work.

        $ sed 's/\t/--->/g' break-yaml.json
        --->{
        --->--->"list": [
        --->--->--->{},
        --->--->--->{}
        --->--->]
        --->}
        $ jq -c . break-yaml.json
        {"list":[{},{}]}
        $ yaml-to-json.py break-yaml.json
        ERROR: break-yaml.json could not be parsed
        while scanning for the next token
        found character '\t' that cannot start any token
          in "break-yaml.json", line 1, column 1
        $ sed 's/\t/    /g' break-yaml.json | yaml-to-json.py
        {"list": [{}, {}]}

      7 replies →

    • Edited with string escapes, the tab didn't make it through HN.

      The error from PyYaml 5.3.1:

          yaml.scanner.ScannerError: while scanning for the next token
          found character '\t' that cannot start any token
            in "<unicode string>", line 4, column 1

      6 replies →

  • This parses fine as YAML in all the tools I've tried. Can you provide the specific versions of the libraries you're using?

They should remove the phrase "every JSON file is also a valid YAML file" from the YAML spec. 1) it isn't true, and 2) it seems like it goes against the implication made here:

> This makes it easy to migrate from JSON to YAML if/when the additional features are required.

If JSON interop is provided solely as a short-term solution that eases the transition to YAML, then I applaud the YAML designers for making a great choice.

> YAML was intended to be a superset

My impression was JSON came years after YAML, and it was somehow coincidental that YAML was almost a superset of JSON.

(Shockingly wikipedia tells me they both came out within a month of each other in 2001).

On the upside, if it's almost a superset then a data producer can make sure it is polyglot by sticking to the intersection of the two.

C++ is not a strict superset of C, but the ability to include C headers is very valuable.

I wasn't able to reproduce any of the issues listed on that page. Does anyone have an example?

I'm not a fan of YAML either, but I think you should not generate YAML files if you can avoid it. All YAML you encounter should be hand-written, so this problem should not occur.

I read "YAML is a superset of JSON" not as a logical statement, but as instructions to humans writing YAML. If you know JSON, you can use that syntax to write YAML. Just like, if you know JavaScript or Python (or to some extent PHP) object syntax, you can write JSON.

If you get a parse error, no biggie, you Alt+Tab to the editor where you are editing the config file and correct it. It is not like you are serving this over the net to some other program.

Same applies to TypeScript. It is not a superset of JavaScript, although many people think it is.

https://stackoverflow.com/a/53698835/

  • As long as you tell the typescript compiler not to stop when it finds type problems, all JavaScript works and compiles, right? That sounds like a superset to me. Syntactically there are no problems, and the error messages are just messages.

    • > As long as you tell the typescript compiler not to stop when it finds type problems, all JavaScript works and compiles, right?

      Does such code count as valid TypeScript though? It sounds more as if the compiler has an option to accept certain invalid programs.

      You could build a C++ compiler with a flag to warn, rather than error, on encountering implicit conversions that are forbidden by the C++ standard. The language the compiler is accepting would then no longer be standard C++, but a superset. (Same for all compiler-specific extensions of course.)

      Personally I'm inclined to agree with this StackOverflow comment. [0] It's an interesting edge-case though.

      [0] https://stackoverflow.com/questions/29918324/is-typescript-r...

      1 reply →

congrats to all involved for sticking to their guns here. specs exist for a reason :D