← Back to context

Comment by hn_throwaway_99

6 years ago

Tangentially related somewhat-common bug: YAML files will interpret the literal 'no' as boolean false if it's not quoted, instead of as a string.

Many developers have wondered why, when they stuck country-specific configurations in a YAML file, that things suddenly stopped working when they expanded support for Norway.

I always felt Yaml is far too complicated of a format for storing hierarchical data. JSON is too simple (no comments; hard to store multi-line strings).

HCL, the hierarchical data storage language used by Terraform, is the closest thing I’ve seen to a happy medium between JSON and Yaml.

Another option, if the string values are not multi-line, is CommentJSON (use the Python module or write 10 lines of code that strips out comments from JSON if using another language).

YAML has tons of warts like this.

https://yaml.org/type/bool.html

This should be used in schools as an example to illustrate how not to do things.

  • Also as an example of "always deserialize to known types". Flexible boolean values can be convenient since it's relatively human-readable, but "deserialize into [whatever the heck you think is appropriate]" is a problem for quite a few reasons beyond confusion: https://lgtm.com/blog/swagger_snakeyaml_CVE-2017-1000207_CVE... (same techniques have been used against other kinds of serialization in many languages for many years)

  • Every feature is a source of bugs. Be careful when constructing end user affordances for systems that have broad applicability and need to run over a very long time span.

I am one of those developers.

Afterwards I just try to avoid yaml if I can. While it looks cleaner than json, I don’t find it especially easy to read and there is unnecessary ambiguity due to unquoted strings. And it seems to have a thing against Norway ;)

I don't have a problem with yes/no per se , I just don't like that it also takes true/false

  • I don't have a problem with any of those representations, and also no problem with all of them at the same time.

    But not only the value representation keeps the types ambiguous, also there is no off-channel place to disambiguate the types, and no value-independent rules for deciding on the types. If any of those was different, there wouldn't be a problem.

I remember a story when Microsoft translated some ancient version of Internet Explorer for Mac, there was a menu where you could select TLDs (I can't remember what for) and the .no domain ended up getting translated as the word