Comment by zzo38computer

2 months ago

I think JSON is too limited and has some problems, so BONJSON has mostly the same problems. There are many other formats as well, some of which add additional types beyond JSON and some don't. Also, a few programs may expect (and possibly require) that files may contain invalid UTF-8, even though it is not proper JSON (I think it would be better that they should not use JSON, due to this and other issues), so there is that too. Using normalized Unicode has its own problems, as does allowing 64-bit integers when some programs expect it and others don't. JSON and Unicode are just not good formats, in general. (There is also a issue with JSON.stringify(-0) but that is an issue with JavaScript that does not seem to be relevant with BONJSON, as far as I can tell.)

Nevertheless, I believe your claims are mostly accurate, except for a few issues with which things are allowed or not allowed, due to JavaScript and other things (although in some of these cases, the BONJSON specification allows options to control this). Sometimes rejecting certain things is helpful, but not always; for example sometimes you do want to allow mismatched surrogates, and sometimes you might want to allow null characters. (The defaults are probably reasonable, but are often the result of a bad design anyways, as I had mentioned above.) Also, the top of the specification says it is safe against many attacks, but these are a feature of the implementation, which would also be the case if you are implement JSON or other formats (although the specification for BONJSON does specify that implementations are supposed to check for these things to make them safe).

(The issue of overlong UTF-8 encodings in IIS web servers is another security issue, which is using a different format for validation and for usage. In this case there are actually two usages though, because one of these usages is the handling of relative URLs (using the ASCII format) and the other is the handling of file names on the server (which might be using UTF-16 here; in addition to that is the internal format of the file paths into individual pieces with the internal handling of relative file paths). There are reasons to avoid and to check for overlong UTF-8 encodings, although this is a different more general issue than the character encoding.)

Another issue is canonical forms; the canonical form of JSON can be messy, especially for numbers (I don't know what the canonical form for numbers in JSON is, but I read that apparently it is complicated).

I think DER is better. BONJSON is more compact but that also makes the framing more complicated to handle than DER (which uses consistent framing for all types). I also wrote a program to convert JSON to DER (I also made up some nonstandard types, although the conversion from JSON to DER only uses one of these nonstandard types (key/value list); the other types it needs are standard ASN.1 types). Furthermore, DER is already canonical form (and I had made up SDER and SDSER for when you do not want canonical form but also do not want the messiness of BER; SDSER does have chunking and does not require the length to be known ahead of time, so more like BONJSON in these ways). Because of the consistent framing, you can easily ignore any types that you do not use; even though there are many types you do not necessarily need all of them.

2 comments

zzo38computer

kstenerud 2 months ago

Yup, and that's perfectly valid. I'm OK with BONJSON not fitting everyone's use case. For me, safety is by far more important than edge cases for systems that require bad data representations. Anyone who needs unsafe things can just stick with JSON (or fix the underlying problems that led to these requirements).

Safe, sane defaults, and some configurability for people who (hopefully) know what they're doing. Falling into success rather than falling into failure.

BONJSON is a small spec, and easy to implement ( https://github.com/kstenerud/ksbonjson/blob/main/library/src... and https://github.com/kstenerud/ksbonjson/blob/main/library/src... ).

It's not the end-all-be-all of data formats; it's just here to make the JSON pipeline suck less.

JSON implementations can be made just as safe, but the issue is that unsafe JSON implementations are still considered valid implementations (and so almost all JSON implementations are unsafe because nobody is an authority on which design is correct). Mandating safety and consistency within the spec is a MAJOR help towards raising the safety of all implementations and avoiding these security vulnerabilities in your infrastructure.

zzo38computer 2 months ago

> Safe, sane defaults, and some configurability for people who (hopefully) know what they're doing.
Yes, I agree (if you want to use it at all, which as I have mentioned you should consider if you should not use JSON or something related), although some of the things that you specify as not having options will make it more restrictive than JSON will be, even if those restrictions might be reasonable by default. One of these is mismatched surrogates (although matched surrogates should always be disallowed, an option to allow mismatched surrogates should be permitted (but not required)). Also, I think checking for duplicate names probably should not use normalized Unicode. Furthermore, the part that says that names MUST NOT be null seems redundant to me, since it already says that names MUST be strings (for compatibility with JSON) and null is not a string.
> Mandating safety and consistency within the spec is a MAJOR help towards raising the safety of all implementations and avoiding these security vulnerabilities in your infrastructure.
OK, this is a valid point, although there is still the possibility of incorrect implementations (adding test cases would help with that problem, though).