Comment by merelysounds

1 month ago

This is a reference to YAML parsing the two letter ISO country code for Norway:

    country: no

As equivalent to a boolean falsy value:

    country: false

It is a relatively common source of problems. One solution is to escape the value:

    country: “no”

More context: https://www.bram.us/2022/01/11/yaml-the-norway-problem/

I think it would be better to require quotation marks around all string values, in order to avoid this kind of problems. (It is not the only problem with YAML, but it is my opinion of how any format with multiple types should require explicitly mentioning if it is a string type, but YAML (and some other formats) doesn't.) (If keys are required to strings, then it can be reasonable to allow keys to be unquoted if the set of characters that unquoted keys can contain is restricted (and disallowing unquoted empty strings as keys).)

We stopped having this problem over ten years ago when spec 1.1 was implemented. Why are people still harking on about it?

  • Current PyYAML:

      >>> import yaml
      >>> yaml.safe_load("country: NO")
      {'country': False}
    

    Other people did not stop having this problem.

    It might be that there’s some setting that fixes this or some better library that everyone should be switching to, but YAML has nothing that I want and has been a repeated source of footguns, so I haven’t found it worth looking into. (I am vaguely aware that different tools do configure YAML parsing with different defaults, which is actually worse. It’s another layer of complexity on an already unnecessarily complex base language.)

    • The ancient rule of ”use software that is updated with bugfixes” certainly applies here.

  • A new spec version doesn’t mean we stop having the problem.

    E.g. kubernetes wrote about solving this only five months ago[1] and by moving from yaml to kyaml, a yaml subset.

    [1]: https://kubernetes.io/blog/2025/07/28/kubernetes-v1-34-sneak...

    • The 1.1 spec was released about _twenty_ years ago, I explicitly used the word _implemented_ for a reason. As in: Our Yaml lib vendor had begun officially supporting that version more than ten years ago.

      3 replies →

  • Because there's a metric ton of software out there that was built once upon a time and then that bit was never updated. I've seen this issue out in the wild across more industries than I can count.

    • I’m not here clanking down on Java for lacking Lambda features, the problem is that I did not update my Java environment past the 2014 version, not a problem with Java.

      2 replies →

  • Because once a technology develops a reputation for having a problem it's practically impossible to rehabilitate it.

  • Now add brackets and end-tags, I'll reconsider. ;)

    • Brackets works fine:

          Roles: [editor, product_manager]
      

      End tags, that I’m not sure what that is. But three dashes is part of the spec to delineate sections:

          something:
              setting: true
          ---
          another:
              thing: false