← Back to context

Comment by _the_inflator

4 days ago

Love or hate JSON, the beauty and utility stem from the fact that you have only the fundamental datatypes as a requirement, and that's it.

Structured data that, by nesting, pleases the human eye, reduced to the max in a key-value fashion, pure minimalism.

And while you have to write type converters all the time for datetime, BLOBs etc., these converters are the real reasons why JSON is so useful: every OS or framework provides the heavy lifting for it.

So any elaborated new silver bullet would require solving the converter/mapper problem, which it can't.

And you can complain or explain with JSON: "Comments not a feature?! WTF!" - Add a field with the key "comment"

Some smart guys went the extra mile and nevertheless demanded more, because wouldn't it be nice to have some sort of "strict JSON"? JSON schema was born.

And here you can visibly experience the inner conflict of "on the one hand" vs "on the other hand". Applying schemas to JSON is a good cause and reasonable, but guess what happens to JSON? It looks like unreadable bloat, which means XML.

Extensibility is fine, basic operations appeal to both demands, simple and sophisticated, and don't impose the sophistication on you just for a simple 3-field exchange about dog food preferences.

My complaint about JSON is that it’s not minimal enough. The receiver always has to validate anyway, so what has syntax typing done for us? Different implementations of JSON disagree about what constitutes a valid value. For instance, is

    {“x”: NaN}

valid JSON? How about 9007199254740993? Or -.053? If so, will that text round trip through your JSON library without loss of precision? Is that desirable if it does?

Basically I think formats with syntax typed primitives always run into this problem: even if the encoder and decoder are consistent with each other about what the values are, the receiver still has to decide whether it can use the result. This after all is the main benefit of a library like Pydantic. But if we’re doing all this work to make sure the object is correct, we know what the value types are supposed to be on the receiving end, so why are we making a needlessly complex decoder guess for us?

  • NaN is not a valid value in JSON. Neither are 0123 or .123 (there must always be at least one digit before the decimal marker, but extraneous leading zeroes are disallowed).

    JSON was originally parsed in javascript with eval() which allowed many things that aren't JSON through, but that doesn't make JSON more complex.

    • That’s my point, though! I’ve run into popular JSON libraries that will emit all of those! 9007199254740993 is problematic because it’s not representable as a 64 bit float. Python’s JSON library is happy to write it, even though you need an int to represent it, and JSON doesn’t have ints.

      Edit: I didn’t see my thought all the way through here. Syntax typing invites this kind of nonconformity, because different programming languages mean different things by “number,” “string,” “date,” or even “null.” They will bend the format to match their own semantics, resulting in incompatibility.

      4 replies →

    • Yeah I would emit NaN and just hope the receiver handles it.

      What's the point of lying about the data?

      The format offers you no data type that would not be an outright lie when applied to this data, so you may as well not lie and break the format

      1 reply →

> you have only the fundamental datatypes as a requirement

Not really; the set of datatypes has problems. It uses Unicode, not binary data and not non-Unicode text. Numbers are usually interpreted as floating point numbers rather than integers, which can also be a problem. Keys can only be strings. And, other problems. So, the data types are not very good.

And, since it is a text format, it means that escaping is required.

> And while you have to write type converters all the time for datetime, BLOBs etc.

Not having a proper data type for binary means that you will need to encode it using different types and then avoids the benefit of JSON, anyways. So, I think JSON is not as helpful.

I think DER is better (you do not have to use all of the types; only the types that you are using is necessary to be implemented, because the format of DER makes it possible to skip anything that you do not care about), and I made up TER which is text based format which can be converted to DER (so, even though a binary data is represented as text, it is still representing the binary data type, rather than needing to use the wrong data type like JSON does).

> And you can complain or explain with JSON: "Comments not a feature?! WTF!" - Add a field with the key "comment"

But then it is a part of the data, which you might not want.

CBOR (and MsgPack) still embraces that simplicity. It provides the same types of key-value, lists, and basic values.

However the types are more precise allowing you to differentiate between int32’s or int64’s or between strings or bytes.

Essentially you can replace json with it and gain performance, less ambiguity but with the same flexibility. You do need a step to print CBOR in human readable form, but it has a standardized human readable form similar to a typed json.