Comment by sevensor

4 days ago

My complaint about JSON is that it’s not minimal enough. The receiver always has to validate anyway, so what has syntax typing done for us? Different implementations of JSON disagree about what constitutes a valid value. For instance, is

    {“x”: NaN}

valid JSON? How about 9007199254740993? Or -.053? If so, will that text round trip through your JSON library without loss of precision? Is that desirable if it does?

Basically I think formats with syntax typed primitives always run into this problem: even if the encoder and decoder are consistent with each other about what the values are, the receiver still has to decide whether it can use the result. This after all is the main benefit of a library like Pydantic. But if we’re doing all this work to make sure the object is correct, we know what the value types are supposed to be on the receiving end, so why are we making a needlessly complex decoder guess for us?

8 comments

sevensor

aidenn0 4 days ago

NaN is not a valid value in JSON. Neither are 0123 or .123 (there must always be at least one digit before the decimal marker, but extraneous leading zeroes are disallowed).

JSON was originally parsed in javascript with eval() which allowed many things that aren't JSON through, but that doesn't make JSON more complex.

sevensor 4 days ago
That’s my point, though! I’ve run into popular JSON libraries that will emit all of those! 9007199254740993 is problematic because it’s not representable as a 64 bit float. Python’s JSON library is happy to write it, even though you need an int to represent it, and JSON doesn’t have ints.
Edit: I didn’t see my thought all the way through here. Syntax typing invites this kind of nonconformity, because different programming languages mean different things by “number,” “string,” “date,” or even “null.” They will bend the format to match their own semantics, resulting in incompatibility.
- dragonwriter 4 days ago
  
  > 9007199254740993 is problematic because it’s not representable as a 64 bit float. Python’s JSON library is happy to write it, even though you need an int to represent it
  JSON numbers have unlimited range in terms of the format standard, but implementations are explicitly permitted to set limits on the range and precision they generate and handle, and users are warned that:
  [...] Since software that implements IEEE 754 binary64 (double precision) numbers is generally available and widely used, good interoperability can be achieved by implementations that expect no more precision or range than these provide, in the sense that implementations will approximate JSON numbers within the expected precision.
  Also, you don't need an int to represent it (a wide enough int will represent it, so will unlimited precision decimals, wide enough binary floats -- of standard formats, IEEE 754 binary128 works -- etc.).
  
  1 reply →
- aidenn0 3 days ago
  
  Before your edit, I was going to object to your premise because it seems like a format could get worse just by more implementations being made.
  After your edit, I see that it's rather that syntax-typed formats are prone to this form of implementation divergence.
  I don't think this is limited to syntax-typed formats though. For example, TNetstrings[1] have type tags, but "#" is an integer. The specification requires that integers fit into 63 bits (since the reference encoder will refuse to encode a python long), but implementations in C tend to allow 64 bits and in other languages allow bignums. It does explicitly allow "nan", "inf", and "-inf" FWIW.
  1: https://tnetstrings.info/
  
  1 reply →
conartist6 3 days ago
Yeah I would emit NaN and just hope the receiver handles it.
What's the point of lying about the data?
The format offers you no data type that would not be an outright lie when applied to this data, so you may as well not lie and break the format
- sevensor 3 days ago
  
  Your other option is to comply with the spec, emit a string, and expect the receiver to deal with that at validation time rather than parse time.