← Back to context

Comment by geocar

5 months ago

> It seems to me that any XML structure can be represented in JSON

Well it can't: JSON has no processing instructions, no references, no comments, JSON "numbers" are problematic, and JSON arrays can't have attributes, so you're stuck with some kind of additional protocol that maps the two.

For something that is basically text (like an HTML document) or a list of dictionaries (like RSS) it may not seem obvious what the value of these things are (or even what they mean, if you have little exposure to XML), so I'll try and explain some of that.

1. Processing instructions are like <?xml?> and <?xml-stylesheet?> -- these let your application embed linear processing instructions that you know are for the implementation, and so you know what your implementation needs to do with the information: If it doesn't need to do anything, you can ignore them easily, because they are (parsewise) distinct.

2. References (called entities) are created with <!ENTITY x ...> and then you use them as &#x; maybe you are familiar with &lt; representing < but this is not mere string replacement: you can work with the pre-parsed entity object (for example, if it's an image), or treat it as a reference (which can make circular objects possible to represent in XML) neither of which is possible in JSON. Entities can be behind external URI as well.

3. Comments are for humans. Lots of people put special {"comment":"xxx"} objects in their JSON, so you need to understand that protocol and filter it. They are obvious (like the processing instructions) in XML.

4. JSON numbers fold into floats of different sizes in different implementations, so you have to avoid them in interchange protocols. This is annoying and bug-prone.

5. Attributes are the things on xml tags <foo bar="42">...</foo> - Some people map this in JSON as {"bar":"42","children":[...],"tag":"foo"} and others like ["foo",{"bar":"42"},...] but you have to make a decision -- the former may be difficult to parse in a streaming way, but the latter creates additional nesting levels.

None of this is insurmountable: You can obviously encapsulate almost anything in almost anything else, but think about all the extra work you're doing, and how much risk there is in that code working forever!

For me: I process financial/business data mostly in XML, so it is very important I am confident my implementation is correct, because shit happens as the result of that document getting to me. Having the vendor provide a spec any XML software can understand helps us have a machine-readable contract, but I am getting a number of new vendors who want to use JSON, and I will tell you their APIs never work: They will give me openapi and swagger "templates" that just don't validate, and type-coding always requires extra parsing of the strings the JSON parsing comes back with. If there's a pager interface: I have to implement special logic for that (this is built-in to XML). If they implement dates, sometimes it's unix-time, sometimes it's 1000x off from that, sometimes it's a ISO8601-inspired string, and fuck sometimes I just get an HTTP date. And so on.

So I am always finding JSON that I wish were XML, because (in my use-cases) XML is just plain better than JSON, but if you do a lot in languages with poor XML support (like JavaScript, Python, etc) all of these things will seem hard enough you might think json+xyz is a good alternative (especially if you like JSON), so I understand the need for stuff like "xee" to make XML more accessible so that people stop doing so much with JSON. I don't know rust well enough to know if xee does that, but I understand fully the need.

><!ENTITY x ...> and then you use them as &#x; maybe you are familiar with &lt; representing <

Okay. This is syntactically painful, APL or J tier. C++ just uses "&" to indicate a reference. That's a lot of people's issue with XML, you get the syntactic pain of APL with the verbosity pain of Java.

> I have to implement special logic for that (this is built-in to XML). If they implement dates, sometimes it's unix-time, sometimes it's 1000x off from that, sometimes it's a ISO8601-inspired string, and fuck sometimes I just get an HTTP date. And so on.

Special logic is built into every real-world programming scenario ever. It just means the programmer had to diverge from ideal to make something work. Unpleasant but vanilla and common. I don't see how XML magically solved the date issue forever. For example, I could just toss in <date>UNIXtime</date> or <date time=microseconds since 1997>324234234</date> or <datecontainer><measurement units="femtoseconds since 1776"><value>3234234234234</value></measurement></datecontainer>. The argument seems to be "ah yes, but if everyone uses this XML date feature it's solved!" but not so. It's a special case of "if everyone did the same thing, it would be solved". But nobody does the same thing.

  • I think you have a totally skewed idea about what is going on.

    Most protocols are used by exactly two parties; I meet someone who wants to have their computer talk to mine and so we have to agree on a protocol for doing so.

    When we agree to use XML, they use that exact date format because I just ask for it. If someone wanted me to produce some weird timestamp-format, I'd ask for whatever xslt they want to include in the payload.

    When we agree to use JSON, schema says integers, email say "unix time", integration testing we discover it's "whatever Date.now() says" and a few months later I discover their computer doesn't know the difference between UTC and GMT.

    Also: I like APL.

I think I can see something of where you're coming from. But a question:

You complain about dates in JSON (really a specific case of parsing text in JSON):

> If they implement dates, sometimes it's unix-time, sometimes it's 1000x off from > that, sometimes it's a ISO8601-inspired string, and fuck sometimes I just get an > HTTP date. And so on.

Sure, but does not XML have the exact same problem because everything is just a text?

  • > Sure, but does not XML have the exact same problem because everything is just a text?

    No, you can specify what type an attribute (or element) is in the XSD (for example, xs:dateTime or xs:date). And there is only one way to specify a date in XML, and it's ISO8601. Of course JSON schema does exist, but it's mostly an afterthought.

    • It sounds to me like you are thinking something like: if they use XML, they'll have a well defined schema and will follow standardized XML types. But if they use JSON they may not have a well-defined schema at all, and may not follow any sort of standardized formats.

      But to my mind, whether they have a well-defined schema and follow proper datatypes really has very little to do with the choice of XML or JSON.