Comment by bayindirh
3 months ago
XML is not a file format only. It's a complete ecosystem built around that file. Protocols, verifiers, file formats built on top of XML.
You can get XML and convert it to everything. I use it to model 3D objects for example, and the model allows for some neat programming tricks while being efficient and more importantly, human readable.
Except being small, JSON is worst of both worlds. A hacky K/V store, at best.
Calling XML human readable is a stretch. It can be with some tooling, but json is easier to read with both tooling and without. There's some level of the schema being relevant to how human readable the serialization is, but I know significantly fewer people that can parse an XML file by sight than json.
Efficient is also... questionable. It requires the full turing machine power to even validate iirc. (surely does to fully parse). by which metric is XML efficient?
By efficiency, I mean it's text and compresses well. If we mean speed, there are extremely fast XML parsers around see this page [0] for state of the art.
For hands-on experience, I used rapidxml for parsing said 3D object files. A 116K XML file is parsed instantly (the rapidxml library's aim is to have speed parity with strlen() on the same file, and they deliver).
Converting the same XML to my own memory model took less than 1ms including creation of classes and interlinking them.
This was on 2010s era hardware (a 3rd generation i7 3770K to be precise).
Verifying the same file against an XSLT would add some milliseconds, not more. Considering the core of the problem might took hours on end torturing memory and CPU, a single 20ms overhead is basically free.
I believe JSON and XML's readability is directly correlated with how the file is designed and written (incl. terminology and how it's formatted), but to be frank, I have seen both good and bad examples on both.
If you can mentally parse HTML, you can mentally parse XML. I tend to learn to parse any markup and programming language mentally so I can simulate them in my mind, but I might be an outlier.
If you're designing a file format based on either for computers only, approaching Perl level regular expressions is not hard.
Oops, forgot the link:
[0]: https://pugixml.org/benchmark.html
> Calling XML human readable is a stretch.
That’s always been the main flaw of XML.
There are very few use case where you wouldn’t be better served by an equivalent more efficient binary format.
You will need a tool to debug xml anyway as soon as it gets a bit complex.
With this you have efficient binary format and generality of XML
https://en.m.wikipedia.org/wiki/Efficient_XML_Interchange
But somehow google forgot to implement this.
A simple text editor of today (Vim, KATE) can real-time sanity check an XML file. Why debug?
4 replies →
It's kinda funny to see "not human readable" as an argument in favor of JSON over XML, when the former doesn't even have comments.
And yet, it's still easier for me to parse with my eyes
I mean, at least JSON has a native syntax to indicate an array, unlike XML which requires that you tack on a schema.
<MyRoot> <AnElement> <Item></Item> </AnElement> </MyRoot>
Serialize that to a JavaScript object, then tell me, is "AnElement" a list or not?
That's one of the reasons why XML is completely useless on the web. The web is full of XML that doesn't have a schema because writing one is a miserable experience.
This is why you can have attributes in a tag. You can make an XML file self explanatory.
Consider the following example:
Most parsers have type aware parsing, so that if somebody tucks string to a place where you expect integer, you can get an error or nil or "0" depending on your choice.
I had the displeasure of parsing XML documents (into Rust) recently. I don't ever want to do this again.
JSON for all it's flaws is beautifully simple in comparison. A number is either a number or the document is invalid. Arrays are just arrays and objects are just objects.
XML on the other hand is the wild west. This particular XML beast had some difficulty sticking to one thing.
Take for instance lists. The same document had two different ways to do them:
Various values were scattered between attributes and child elements with no rhyme or reason.
To prevent code reuse, some element names were namespaced, so you might have <ThingName /> and <FooName />.
To round off my already awful day, some numbers were formatted with thousands separators. Of course, these can change depending on your geographical location.
Now, one could say that this is just the fault of the specific XML files I was parsing. And while I would partially agree, the fact that a format makes this possible is a sign of it's quality.
Since there's no clear distinction between objects and arrays you have to pick one. Or multiple.
Since objects can be represented with both attributes and children you have to pick one. Or both.
Since there are no numbers in XML, you can just write them out any way you want. Multiple ways is of course preferable.
3 replies →
it's a lot of things, none of them in the browser anymore
RSS says hi!
as much as it pains me to say it, that is also a sailed ship
3 replies →