Comment by sevensor

3 days ago

Suppose you have the EDN text

    (
      {
        :name "Fred"
        :age 35
      }
      {
        :name 37
        :age "Wilma"
      }
    )

There's a semantic error here; the name and age fields have been swapped in the second element of the list. At some point, somebody has to check whether :name is a string and :age is a number. If your application is going to do that anyway, why do syntax typing? You might as well just try to construct a number from "Wilma" at the point where you know you need a number.

Obviously I have an opinion here, but I'm putting it out there in the hope of being contradicted. The whole world seems to run on JSON, and I'm struggling to understand how syntax typing helps with JSON document validation rather than needlessly complicating the syntax.

I guess there are two questions: should the serialization format be coupled with the schema system, and should the serialization format have types.

If you answer the first question with no, then the second question is revealed to just be about various considerations other than validation, such as legibility and obvious mapping to language types (such as having a common notation for symbols/keywords, sets, etc).

JSON and EDN are similar here, if your comment was in context of JSON vs EDN difference. There's some incidental additional checking on the syntax level with EDN but that's not its purpouse.

You can do interesting things with the data even if you don't parse/validate all of it.

Eg an important feature of the spec schema system and philosophy is that you don't want closed specs, you want code to be able to handle and pass on data that is richer than what the code knows about, and if circumstances allow you shouldn't try to validate it in one place.

What do you mean under "syntax typing" and complications in the syntax?

> The whole world seems to run on JSON

That is true, and I don't like that :)

From my perspective JSON syntax is too "light" and that translates to many complications typically in the form of convention: {"id": {"__MW__type": "LONG NUMBER", "value": "9999999999999999999999999"}}.

  • > convention: {"id": {"__MW__type": "LONG NUMBER", "value": "9999999999999999999999999"}}.

    Huh. I haven’t run into this, although I totally see the problem. It’s backdooring types into JSON that it doesn’t support. I agree JSON’s number types are weak; it’s been a source of real problems for me. Given that observation, you can go in two directions: have richer types, like EDN, or give up on types in JSON entirely, which is the alternative I’d propose. I need to put my money where my mouth is here and implement something to demonstrate what I’m talking about, but imagine if JSON didn’t have numbers at all. The receiver would have to convert values to numbers after decoding, but I’m arguing that’s fine because in practice you have to check the value’s type anyway before you can use it.

    When I say “syntax typing,” I mean that, for example, 31 is a number and “blue” is a string, and we know that because the string has quotation marks around it and the number is made of decimal digits.

  • > What do you mean under "syntax typing" and complications in the syntax?

    This question has been answered by someone else, but I have my own comments about this as well so I will write it also.

    EDN does complicate the syntax (so does XML, TER, various extensions of JSON, etc), but DER (and SDSER, if you want streaming) avoids this problem because the framing is the same for all data types, even though it has many different types and the encoding of the values of each type.

    > That [the whole world seems to run on JSON] is true, and I don't like that :)

    I agree with you (well, not everything but too many things); I don't like that either.

    > From my perspective JSON syntax is too "light" and that translates to many complications typically in the form of convention

    I agree with you about that too. In this case it is a number (there are problems with the numeric types in JSON), but there is also such things as: octet strings, date/time, non-Unicode text, etc.

    > I agree JSON’s number types are weak; it’s been a source of real problems for me. Given that observation, you can go in two directions: have richer types, like EDN, or give up on types in JSON entirely, which is the alternative I’d propose.

    Not only the number type (which is floating point so there is not a proper 64-bit or larger integer type, even though a integer type has been added into JavaScript after JSON was invented); the string type is also weak (since it cannot be arbitrary bytes), and so is the key/value list type (keys are only allowed to be strings and cannot be other types).

    There are other directions as well; I think DER does in between because of the same framing for all types even though there are many types (you do not have to use all of the types; it seems that some people don't like it apparently due to the expectation that you have to use all of the types, but that is wrong). (DER also has the advantage of the canonical form if you need it (DER is already canonical form); although there is a canonical form for JSON, it is a bit messy, and apparently the canonical form for numbers in JSON is complicated.)

    If you want to give up on types entirely, then why should you use JSON?