Comment by esrauch

1 month ago

This is very interesting, though the limitations for 'security' reasons seem somewhat surprising to me compared to the claim "Anything JSON can do, it can do. Anything JSON can't do, it can't do.".

Simplest example, "a\u0000b" is a perfectly valid and in-bounds JSON string that valid JSON data sets may have in it. Doesn't it end up falling short of 'Anything JSON can do, it can do" to refuse to serialize that string?

9 comments

esrauch

kstenerud 1 month ago

"a\u0000b" ("a" followed by a vertical tabulation control code) is also a perfectly valid and in-bounds BONJSON string. What BONJSON rejects is any invalid UTF-8 sequences, which shouldn't even be present in the data to begin with.

wizzwizz4 1 month ago
You're thinking of "a\u000b". "a\u0000b" is the three-character string also written "a\x00b".
- kstenerud 1 month ago
  
  Bleh... This is why my text formats use \[10c0de] to escape unicode codepoints. Much easier for humans to parse.
esrauch 1 month ago
My example was a three character string where the second one is \u0000, which is the NUL character in the middle of the string.
The spec on the GitHub says that it is banned to include NUL under a security stance, that someone that after parse someone might do strlen and accidentally truncate to a shorter string in C.
Which I think has some premise, but its a valid string contents in JSON (and in Utf8), so it is deliberately breaking 1:1 parity with JSON parity in the name of a security hypothetical.
- kstenerud 1 month ago
  
  The spec says that implementations must disable NUL by default (as in, the default configuration must disallow). https://github.com/kstenerud/bonjson/blob/main/bonjson.md#nu...
  Users can of course enable NUL in the rare cases where they need it, but I want safe defaults.
  Actually, I'll make that section clearer.
  
  3 replies →
gritzko 1 month ago

Did you read "Parsing JSON is a minefield"?