Comment by remram
3 years ago
For example, this valid JSON doesn't parse as YAML:
{
"list": [
{},
{}
]
}
(tested on Python)
edit: whitespace didn't quite make it through HN, here:
json.loads('{\n "list": [\n {},\n\t{}\n ]\n}')
yaml.load ('{\n "list": [\n {},\n\t{}\n ]\n}')
Python's .netrc library also hasn't supported comments correctly for like 5 years. The bug was reported, it was never fixed. If I want to use my .netrc file with Python programs, I have to remove all comments (that work with every other .netrc-using program).
It's 2022 and we can't even get a plaintext configuration format from 1980 right.
> It's 2022 and we can't even get a plaintext configuration format from 1980 right.
To me, it's more depressing that we've been at this for 50-60 years and still seemingly don't have an unambiguously good plaintext configuration format at all.
I've been a Professional Config File Wrangler for two decades, and I can tell you that it's always nicer to have a config file that's built to task rather than being forced to tie yourself into knots when somebody didn't want to write a parser.
The difference between a data format and a configuration file is the use case. JSON and YAML were invented to serialize data. They only make sense if they're only ever written programmatically and expressing very specific data, as they're full of features specific to loading and transforming data types, and aren't designed to make it easy for humans to express application-specific logic. Editing them by hand is like walking a gauntlet blindfolded, and then there's the implementation differences due to all the subtle complexity.
Apache, Nginx, X11, RPM, SSHD, Terraform, and other programs have configuration files designed by humans for humans. They make it easy to accomplish tasks specific to those programs. You wouldn't use an INI file to configure Apache, and you wouldn't use an Apache config to build an RPM package. Terraform may need a ton of custom logic and functions, but X11 doesn't (Terraform actually has 2 configuration formats and a data serialization format, and Packer HCL is different than Terraform HCL). Config formats minimize footguns by being intuitive, matching application use case, and avoiding problematic syntax (if designed well). And you'd never use any of them to serialize data. Their design makes the programs more or less complex; they can avoid complexity by supporting totally random syntax for one weird edge case. Design decisions are just as important in keeping complexity down as in keeping good UX.
Somebody could take an inventory of every configuration format in existence, matrix their properties, come up with a couple categories of config files, and then plop down 3 or 4 standards. My guess is there's multiple levels of configuration complexity (INI -> "Unixy" (sudoers, logrotate) -> Apache -> HCL) depending on the app's uses. But that's a lot of work, and I'm not volunteering...
I quite like CUELang (https://cuelang.org/), although it not yet widely supported.
It has a good balance between expressivity and readability, it got enough logic to be useful, but not so much it begs for abuses, it can import/export to yaml and json and features an elegant type system which lets you define both the schema and the data itself.
I hope it gains traction.
toml is pretty much the best one I have seen so far. At least for small to medium size config files.
35 replies →
We do, it’s called TOML. The future is here it’s just not equally distributed.
11 replies →
XML is still good.
Hmm, it looks like it’s handled comments for at least a decade:
https://github.com/python/cpython/blame/d75a51bea3c2442f81d3...
Oh, maybe it’s this issue:
https://bugs.python.org/issue34132
If I’ve read it correctly, there was a regression from Python 2.x to 3.x such that you now need to format comments:
Instead of:
(A space after the # isn’t accepted by the parser.)
Great idea.
increases the chances it will work
A note to readers: it's not always a good idea to put automated software installation in a place that users don't expect it.
I've seen that kind of approach cause a ton of issues the moment that the software was used in a different environment than the author expected.
It's much better IMO to fail with a message about how to install the missing dependency.
This is why there should be a way to automatically install software into a sandboxed location, e.g. a virtualenv.
Considering we are having software drive cars today it should be trivial and I would say even arguably expected that software should be able to autonomously "figure out" how to run itself and avoid conflicts with other software since that's a trivial task in comparison to navigating city streets.
Brilliant! What license is this published under?
Free Art License
Tested on python what? I was curious to see what error that produced, figuring it would be some whitespace due to the difference between the list items, but using the yamlized python that I had lying around, it did the sane thing:
produces
With leading tabs it does not work.
This is completely valid YAML.
YAML does not allow tabs in indentation, but the tabs in your example are not indentation according to the YAML spec productions.
You can see it clearly here against many YAML parsers: https://play.yaml.io/main/parser?input=CXsKCQkibGlzdCI6IFsKC...
As tinita points out, sadly PyYAML and libyaml implement this wrong.
See https://matrix.yaml.info/
That's because PyYAML doesn't implement the spec correctly.
Tabs are not valid JSON
3 replies →
Edited with string escapes, the tab didn't make it through HN.
The error from PyYaml 5.3.1:
If it continues to be hard to share, I suggest encoding it as a base64 string so folks can decode it into a file with exactly the right contents.
2 replies →
Thanks, I'm finally able to reproduce this.
It would be great if instead of the histrionic message on CPAN (which amusingly accuses others of "mass hysteria"), the author would just say "YAML documents can't start with a tab while JSON documents can, making JSON not a strict subset of YAML".
The YAML spec should be updated to reflect this, but I wonder if a simple practical workaround in YAML parsers (like replacing each tab at the beginning of the document with two spaces before feeding it to the tokenizer) would be sufficient in the short term.
2 replies →
This parses fine as YAML in all the tools I've tried. Can you provide the specific versions of the libraries you're using?