Comment by hoherd
3 years ago
With leading tabs it does not work.
$ sed 's/\t/--->/g' break-yaml.json
--->{
--->--->"list": [
--->--->--->{},
--->--->--->{}
--->--->]
--->}
$ jq -c . break-yaml.json
{"list":[{},{}]}
$ yaml-to-json.py break-yaml.json
ERROR: break-yaml.json could not be parsed
while scanning for the next token
found character '\t' that cannot start any token
in "break-yaml.json", line 1, column 1
$ sed 's/\t/ /g' break-yaml.json | yaml-to-json.py
{"list": [{}, {}]}
This is completely valid YAML.
YAML does not allow tabs in indentation, but the tabs in your example are not indentation according to the YAML spec productions.
You can see it clearly here against many YAML parsers: https://play.yaml.io/main/parser?input=CXsKCQkibGlzdCI6IFsKC...
As tinita points out, sadly PyYAML and libyaml implement this wrong.
See https://matrix.yaml.info/
That's because PyYAML doesn't implement the spec correctly.
Tabs are not valid JSON
Do you have a link for that?
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe... says:
> Insignificant whitespace may be present anywhere except within a JSONNumber [forbidden] or JSONString [interpreted as part of the string]
And specifically lists tab as whitespace:
> The tab character (U+0009), carriage return (U+000D), line feed (U+000A), and space (U+0020) characters are the only valid whitespace characters.
More specifically, expanding https://datatracker.ietf.org/doc/html/rfc8259#section-2 gives an array as (roughly)
> ws %x5B ws value (ws %x2C ws value)* ws %x5D ws
Where `ws` explicitly includes `%x09`. Which seems to cover this case?
Per RFC 8259:
The grammar in https://www.json.org/json-en.html disagrees. It has