← Back to context

Comment by remram

3 years ago

Edited with string escapes, the tab didn't make it through HN.

The error from PyYaml 5.3.1:

    yaml.scanner.ScannerError: while scanning for the next token
    found character '\t' that cannot start any token
      in "<unicode string>", line 4, column 1

If it continues to be hard to share, I suggest encoding it as a base64 string so folks can decode it into a file with exactly the right contents.

  • Not base64, but this should be easy to reproduce:

      $ printf '{\n\t"list": [\n\t\t{},\n\t\t{}\n\t]\n}\n' > test.json
    
      $ jq < test.json 
      {
        "list": [
          {},
          {}
        ]
      }
    
      $ yamllint test.json 
      test.json
        2:1       error    syntax error: found character '\t' that cannot start any token (syntax)

Thanks, I'm finally able to reproduce this.

It would be great if instead of the histrionic message on CPAN (which amusingly accuses others of "mass hysteria"), the author would just say "YAML documents can't start with a tab while JSON documents can, making JSON not a strict subset of YAML".

The YAML spec should be updated to reflect this, but I wonder if a simple practical workaround in YAML parsers (like replacing each tab at the beginning of the document with two spaces before feeding it to the tokenizer) would be sufficient in the short term.

  • > "YAML documents can't start with a tab while JSON documents can, making JSON not a strict subset of YAML"

    But YAML can start with tabs. Tabs are allowed as separating whitespace in most of the spec productions but are not allowed as indentation. Even though those tabs look like indentation, the spec productions don't interpret them as such.

    See my comment above and esp see https://play.yaml.io/main/parser?input=CXsKCQkibGlzdCI6IFsKC...

    Note: the YAML spec maintainers (I am one) have identified many issues with YAML which we are actively working on, but (somewhat surprisingly) we have yet to find a case where valid JSON is invalid YAML 1.2.

    • Thanks for the clarification. Let's fix it in PyYAML then :)

      Speaking of PyYAML, I recently ran into an issue where I had to heavily patch PyYAML to prevent its parse result from being susceptible to entity expansion attacks. It would be nice to at least have a PyYAML mode to completely ignore anchors and aliases (as well as tags) using simple keyword arguments. Protection against entity expansion abuse would be nice too.