Comment by sevensor

3 days ago

I don’t wish to pick on this post, it looks quite well done. However, in general, I have some doubts about data formats with typed primitives. JSON, TOML, ASN.1, what have you. There’s very little you can do with the data unless you apply a schema, so why decode before then? The schema tells you what type you need anyway, so why add syntax complexity if you have to double check the result of parsing?

I think it depends what you will intend to do with the data (which is true for all of the formats that you mentioned); not everyone will do the same thing with it even if it is the same file. It might be helpful to know from other programs that do not know this schema to be able to parse the data (not always the case when using IMPLICIT types in ASN.1, which is one reason to use EXPLICIT instead, although it has avantages and disadvantages compared with IMPLICIT; however, in DER all types will use the same framing allowing the framing to be parsed even if the specific type cannot be understood by the reader), and can also be used in case the schema is later extended to use types other than the ones that were originally expected. (I prefer to use ASN.1 DER in my stuff, although JSON and other formats are also used by other formats that were made by someone else)

  • > It might be helpful to know from other programs that do not know this schema to be able to parse the data

    OK that’s a really interesting question: if you’re interpreting a text without knowing what it’s about, having type information embedded in it could help clarify the writer’s intent? That seems reasonable. Have you done this?

I can do a lot without applying schema at all. For that I only need handful of types defined in EDN specification and Clojure programming language.

  • Suppose you have the EDN text

        (
          {
            :name "Fred"
            :age 35
          }
          {
            :name 37
            :age "Wilma"
          }
        )
    

    There's a semantic error here; the name and age fields have been swapped in the second element of the list. At some point, somebody has to check whether :name is a string and :age is a number. If your application is going to do that anyway, why do syntax typing? You might as well just try to construct a number from "Wilma" at the point where you know you need a number.

    Obviously I have an opinion here, but I'm putting it out there in the hope of being contradicted. The whole world seems to run on JSON, and I'm struggling to understand how syntax typing helps with JSON document validation rather than needlessly complicating the syntax.

    • I guess there are two questions: should the serialization format be coupled with the schema system, and should the serialization format have types.

      If you answer the first question with no, then the second question is revealed to just be about various considerations other than validation, such as legibility and obvious mapping to language types (such as having a common notation for symbols/keywords, sets, etc).

      JSON and EDN are similar here, if your comment was in context of JSON vs EDN difference. There's some incidental additional checking on the syntax level with EDN but that's not its purpouse.

      You can do interesting things with the data even if you don't parse/validate all of it.

      Eg an important feature of the spec schema system and philosophy is that you don't want closed specs, you want code to be able to handle and pass on data that is richer than what the code knows about, and if circumstances allow you shouldn't try to validate it in one place.

    • What do you mean under "syntax typing" and complications in the syntax?

      > The whole world seems to run on JSON

      That is true, and I don't like that :)

      From my perspective JSON syntax is too "light" and that translates to many complications typically in the form of convention: {"id": {"__MW__type": "LONG NUMBER", "value": "9999999999999999999999999"}}.

      2 replies →