Comment by mdaniel
3 years ago
Tested on python what? I was curious to see what error that produced, figuring it would be some whitespace due to the difference between the list items, but using the yamlized python that I had lying around, it did the sane thing:
PATH=$HOMEBREW_PREFIX/opt/ansible/libexec/bin:$PATH
pip list | grep -i yaml
python -V
python <<'DOIT'
from io import StringIO
import yaml
print(yaml.safe_load(StringIO(
'''
{
"list": [
{},
{}
]
}
''')))
DOIT
produces
PyYAML 6.0
Python 3.10.1
{'list': [{}, {}]}
With leading tabs it does not work.
This is completely valid YAML.
YAML does not allow tabs in indentation, but the tabs in your example are not indentation according to the YAML spec productions.
You can see it clearly here against many YAML parsers: https://play.yaml.io/main/parser?input=CXsKCQkibGlzdCI6IFsKC...
As tinita points out, sadly PyYAML and libyaml implement this wrong.
See https://matrix.yaml.info/
That's because PyYAML doesn't implement the spec correctly.
Tabs are not valid JSON
Do you have a link for that?
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe... says:
> Insignificant whitespace may be present anywhere except within a JSONNumber [forbidden] or JSONString [interpreted as part of the string]
And specifically lists tab as whitespace:
> The tab character (U+0009), carriage return (U+000D), line feed (U+000A), and space (U+0020) characters are the only valid whitespace characters.
More specifically, expanding https://datatracker.ietf.org/doc/html/rfc8259#section-2 gives an array as (roughly)
> ws %x5B ws value (ws %x2C ws value)* ws %x5D ws
Where `ws` explicitly includes `%x09`. Which seems to cover this case?
Per RFC 8259:
The grammar in https://www.json.org/json-en.html disagrees. It has
Edited with string escapes, the tab didn't make it through HN.
The error from PyYaml 5.3.1:
If it continues to be hard to share, I suggest encoding it as a base64 string so folks can decode it into a file with exactly the right contents.
This is, unwittingly, the most YAML-relevant comment in this thread.
Not base64, but this should be easy to reproduce:
Thanks, I'm finally able to reproduce this.
It would be great if instead of the histrionic message on CPAN (which amusingly accuses others of "mass hysteria"), the author would just say "YAML documents can't start with a tab while JSON documents can, making JSON not a strict subset of YAML".
The YAML spec should be updated to reflect this, but I wonder if a simple practical workaround in YAML parsers (like replacing each tab at the beginning of the document with two spaces before feeding it to the tokenizer) would be sufficient in the short term.
> "YAML documents can't start with a tab while JSON documents can, making JSON not a strict subset of YAML"
But YAML can start with tabs. Tabs are allowed as separating whitespace in most of the spec productions but are not allowed as indentation. Even though those tabs look like indentation, the spec productions don't interpret them as such.
See my comment above and esp see https://play.yaml.io/main/parser?input=CXsKCQkibGlzdCI6IFsKC...
Note: the YAML spec maintainers (I am one) have identified many issues with YAML which we are actively working on, but (somewhat surprisingly) we have yet to find a case where valid JSON is invalid YAML 1.2.
1 reply →