Comment by remram

3 years ago

For example, this valid JSON doesn't parse as YAML:

    {
        "list": [
            {},
                {}
        ]
    }

(tested on Python)

edit: whitespace didn't quite make it through HN, here:

    json.loads('{\n  "list": [\n    {},\n\t{}\n    ]\n}')
    yaml.load ('{\n  "list": [\n    {},\n\t{}\n    ]\n}')

77 comments

remram

0xbadcafebee 3 years ago

Python's .netrc library also hasn't supported comments correctly for like 5 years. The bug was reported, it was never fixed. If I want to use my .netrc file with Python programs, I have to remove all comments (that work with every other .netrc-using program).

It's 2022 and we can't even get a plaintext configuration format from 1980 right.

macintux 3 years ago
> It's 2022 and we can't even get a plaintext configuration format from 1980 right.
To me, it's more depressing that we've been at this for 50-60 years and still seemingly don't have an unambiguously good plaintext configuration format at all.
- 0xbadcafebee 3 years ago
  
  I've been a Professional Config File Wrangler for two decades, and I can tell you that it's always nicer to have a config file that's built to task rather than being forced to tie yourself into knots when somebody didn't want to write a parser.
  The difference between a data format and a configuration file is the use case. JSON and YAML were invented to serialize data. They only make sense if they're only ever written programmatically and expressing very specific data, as they're full of features specific to loading and transforming data types, and aren't designed to make it easy for humans to express application-specific logic. Editing them by hand is like walking a gauntlet blindfolded, and then there's the implementation differences due to all the subtle complexity.
  Apache, Nginx, X11, RPM, SSHD, Terraform, and other programs have configuration files designed by humans for humans. They make it easy to accomplish tasks specific to those programs. You wouldn't use an INI file to configure Apache, and you wouldn't use an Apache config to build an RPM package. Terraform may need a ton of custom logic and functions, but X11 doesn't (Terraform actually has 2 configuration formats and a data serialization format, and Packer HCL is different than Terraform HCL). Config formats minimize footguns by being intuitive, matching application use case, and avoiding problematic syntax (if designed well). And you'd never use any of them to serialize data. Their design makes the programs more or less complex; they can avoid complexity by supporting totally random syntax for one weird edge case. Design decisions are just as important in keeping complexity down as in keeping good UX.
  Somebody could take an inventory of every configuration format in existence, matrix their properties, come up with a couple categories of config files, and then plop down 3 or 4 standards. My guess is there's multiple levels of configuration complexity (INI -> "Unixy" (sudoers, logrotate) -> Apache -> HCL) depending on the app's uses. But that's a lot of work, and I'm not volunteering...
- BiteCode_dev 3 years ago
  
  I quite like CUELang (https://cuelang.org/), although it not yet widely supported.
  It has a good balance between expressivity and readability, it got enough logic to be useful, but not so much it begs for abuses, it can import/export to yaml and json and features an elegant type system which lets you define both the schema and the data itself.
  I hope it gains traction.
- chillfox 3 years ago
  
  toml is pretty much the best one I have seen so far. At least for small to medium size config files.
  
  35 replies →
- zarzavat 3 years ago
  
  We do, it’s called TOML. The future is here it’s just not equally distributed.
  
  11 replies →
- stickfigure 3 years ago
  
  XML is still good.
js2 3 years ago
Hmm, it looks like it’s handled comments for at least a decade:
https://github.com/python/cpython/blame/d75a51bea3c2442f81d3...
Oh, maybe it’s this issue:
https://bugs.python.org/issue34132
If I’ve read it correctly, there was a regression from Python 2.x to 3.x such that you now need to format comments:
#like this
Instead of:
# like this
(A space after the # isn’t accepted by the parser.)

dheera 3 years ago

    try:
        try:
            import orjson as json
        except:
            try:
                import rapidjson as json
            except:
                try:
                    import fast_json as json
                except:
                    import json
        foo = json.loads(string)
    except:
        try:
            import yaml
        except:
            # try harder
            import os
            try:
                assert(os.system("pip3 install yaml") == 0)
            except:
                # try even harder
                try:
                    assert(os.system("sudo apt install python3-pip && pip3 install yaml") == 0)
                except:
                    assert(os.system("sudo yum install python3-pip && pip3 install yaml") == 0)
            import yaml
        try:
            foo = yaml.loads(string)
        except:
            try:
                ....

tomjakubowski 3 years ago
Great idea.
pip install --user yaml
increases the chances it will work
nrclark 3 years ago
A note to readers: it's not always a good idea to put automated software installation in a place that users don't expect it.
I've seen that kind of approach cause a ton of issues the moment that the software was used in a different environment than the author expected.
It's much better IMO to fail with a message about how to install the missing dependency.
- dheera 3 years ago
  
  This is why there should be a way to automatically install software into a sandboxed location, e.g. a virtualenv.
  Considering we are having software drive cars today it should be trivial and I would say even arguably expected that software should be able to autonomously "figure out" how to run itself and avoid conflicts with other software since that's a trivial task in comparison to navigating city streets.
imoverclocked 3 years ago
Brilliant! What license is this published under?
- dheera 3 years ago
  
  Free Art License

mdaniel 3 years ago

Tested on python what? I was curious to see what error that produced, figuring it would be some whitespace due to the difference between the list items, but using the yamlized python that I had lying around, it did the sane thing:

    PATH=$HOMEBREW_PREFIX/opt/ansible/libexec/bin:$PATH
    pip list | grep -i yaml
    python -V
    python <<'DOIT'
    from io import StringIO
    import yaml
    print(yaml.safe_load(StringIO(
    '''
        {
            "list": [
                {},
                    {}
            ]
        }
    ''')))
    DOIT

produces

    PyYAML                6.0
    Python 3.10.1
    {'list': [{}, {}]}

hoherd 3 years ago
With leading tabs it does not work.
$ sed 's/\t/--->/g' break-yaml.json --->{ --->--->"list": [ --->--->--->{}, --->--->--->{} --->--->] --->} $ jq -c . break-yaml.json {"list":[{},{}]} $ yaml-to-json.py break-yaml.json ERROR: break-yaml.json could not be parsed while scanning for the next token found character '\t' that cannot start any token in "break-yaml.json", line 1, column 1 $ sed 's/\t/ /g' break-yaml.json | yaml-to-json.py {"list": [{}, {}]}
- ingy 3 years ago
  
  This is completely valid YAML.
  YAML does not allow tabs in indentation, but the tabs in your example are not indentation according to the YAML spec productions.
  You can see it clearly here against many YAML parsers: https://play.yaml.io/main/parser?input=CXsKCQkibGlzdCI6IFsKC...
  As tinita points out, sadly PyYAML and libyaml implement this wrong.
  See https://matrix.yaml.info/
- tinita 3 years ago
  
  That's because PyYAML doesn't implement the spec correctly.
- MrPatan 3 years ago
  
  Tabs are not valid JSON
  
  3 replies →
remram 3 years ago
Edited with string escapes, the tab didn't make it through HN.
The error from PyYaml 5.3.1:
yaml.scanner.ScannerError: while scanning for the next token found character '\t' that cannot start any token in "<unicode string>", line 4, column 1
- notreallyserio 3 years ago
  
  If it continues to be hard to share, I suggest encoding it as a base64 string so folks can decode it into a file with exactly the right contents.
  
  2 replies →
- ak217 3 years ago
  
  Thanks, I'm finally able to reproduce this.
  It would be great if instead of the histrionic message on CPAN (which amusingly accuses others of "mass hysteria"), the author would just say "YAML documents can't start with a tab while JSON documents can, making JSON not a strict subset of YAML".
  The YAML spec should be updated to reflect this, but I wonder if a simple practical workaround in YAML parsers (like replacing each tab at the beginning of the document with two spaces before feeding it to the tokenizer) would be sufficient in the short term.
  
  2 replies →

ak217 3 years ago

This parses fine as YAML in all the tools I've tried. Can you provide the specific versions of the libraries you're using?