Comment by esrauch

1 month ago

This is very interesting, though the limitations for 'security' reasons seem somewhat surprising to me compared to the claim "Anything JSON can do, it can do. Anything JSON can't do, it can't do.".

Simplest example, "a\u0000b" is a perfectly valid and in-bounds JSON string that valid JSON data sets may have in it. Doesn't it end up falling short of 'Anything JSON can do, it can do" to refuse to serialize that string?

"a\u0000b" ("a" followed by a vertical tabulation control code) is also a perfectly valid and in-bounds BONJSON string. What BONJSON rejects is any invalid UTF-8 sequences, which shouldn't even be present in the data to begin with.

  • You're thinking of "a\u000b". "a\u0000b" is the three-character string also written "a\x00b".

    • Bleh... This is why my text formats use \[10c0de] to escape unicode codepoints. Much easier for humans to parse.

  • My example was a three character string where the second one is \u0000, which is the NUL character in the middle of the string.

    The spec on the GitHub says that it is banned to include NUL under a security stance, that someone that after parse someone might do strlen and accidentally truncate to a shorter string in C.

    Which I think has some premise, but its a valid string contents in JSON (and in Utf8), so it is deliberately breaking 1:1 parity with JSON parity in the name of a security hypothetical.