Comment by bayindirh

3 months ago

XML is not a file format only. It's a complete ecosystem built around that file. Protocols, verifiers, file formats built on top of XML.

You can get XML and convert it to everything. I use it to model 3D objects for example, and the model allows for some neat programming tricks while being efficient and more importantly, human readable.

Except being small, JSON is worst of both worlds. A hacky K/V store, at best.

23 comments

bayindirh

ongy 3 months ago

Calling XML human readable is a stretch. It can be with some tooling, but json is easier to read with both tooling and without. There's some level of the schema being relevant to how human readable the serialization is, but I know significantly fewer people that can parse an XML file by sight than json.

Efficient is also... questionable. It requires the full turing machine power to even validate iirc. (surely does to fully parse). by which metric is XML efficient?

bayindirh 3 months ago

By efficiency, I mean it's text and compresses well. If we mean speed, there are extremely fast XML parsers around see this page [0] for state of the art.
For hands-on experience, I used rapidxml for parsing said 3D object files. A 116K XML file is parsed instantly (the rapidxml library's aim is to have speed parity with strlen() on the same file, and they deliver).
Converting the same XML to my own memory model took less than 1ms including creation of classes and interlinking them.
This was on 2010s era hardware (a 3rd generation i7 3770K to be precise).
Verifying the same file against an XSLT would add some milliseconds, not more. Considering the core of the problem might took hours on end torturing memory and CPU, a single 20ms overhead is basically free.
I believe JSON and XML's readability is directly correlated with how the file is designed and written (incl. terminology and how it's formatted), but to be frank, I have seen both good and bad examples on both.
If you can mentally parse HTML, you can mentally parse XML. I tend to learn to parse any markup and programming language mentally so I can simulate them in my mind, but I might be an outlier.
If you're designing a file format based on either for computers only, approaching Perl level regular expressions is not hard.
Oops, forgot the link:
[0]: https://pugixml.org/benchmark.html
StopDisinfo910 3 months ago
> Calling XML human readable is a stretch.
That’s always been the main flaw of XML.
There are very few use case where you wouldn’t be better served by an equivalent more efficient binary format.
You will need a tool to debug xml anyway as soon as it gets a bit complex.
- scotty79 3 months ago
  
  With this you have efficient binary format and generality of XML
  https://en.m.wikipedia.org/wiki/Efficient_XML_Interchange
  But somehow google forgot to implement this.
- bayindirh 3 months ago
  
  A simple text editor of today (Vim, KATE) can real-time sanity check an XML file. Why debug?
  
  4 replies →
int_19h 3 months ago
It's kinda funny to see "not human readable" as an argument in favor of JSON over XML, when the former doesn't even have comments.
- queenkjuul 3 months ago
  
  And yet, it's still easier for me to parse with my eyes

mortarion 3 months ago

I mean, at least JSON has a native syntax to indicate an array, unlike XML which requires that you tack on a schema.

Serialize that to a JavaScript object, then tell me, is "AnElement" a list or not?

That's one of the reasons why XML is completely useless on the web. The web is full of XML that doesn't have a schema because writing one is a miserable experience.

bayindirh 3 months ago
This is why you can have attributes in a tag. You can make an XML file self explanatory.
Consider the following example:
<MyRoot> <AnElement type="list" items="1"> <Item>Hello, World!</Item> </AnElement> <MyRoot>
Most parsers have type aware parsing, so that if somebody tucks string to a place where you expect integer, you can get an error or nil or "0" depending on your choice.
- dminik 3 months ago
  
  I had the displeasure of parsing XML documents (into Rust) recently. I don't ever want to do this again.
  JSON for all it's flaws is beautifully simple in comparison. A number is either a number or the document is invalid. Arrays are just arrays and objects are just objects.
  XML on the other hand is the wild west. This particular XML beast had some difficulty sticking to one thing.
  Take for instance lists. The same document had two different ways to do them:
  <Thing> <Name>...</Name> <Image>...</Image> <Image>...</Image> </Thing> <Thing> <Name>...</Name> <Images> <Image>...</Image> <Image>...</Image> </Images> </Thing>
  Various values were scattered between attributes and child elements with no rhyme or reason.
  To prevent code reuse, some element names were namespaced, so you might have <ThingName /> and <FooName />.
  To round off my already awful day, some numbers were formatted with thousands separators. Of course, these can change depending on your geographical location.
  Now, one could say that this is just the fault of the specific XML files I was parsing. And while I would partially agree, the fact that a format makes this possible is a sign of it's quality.
  Since there's no clear distinction between objects and arrays you have to pick one. Or multiple.
  Since objects can be represented with both attributes and children you have to pick one. Or both.
  Since there are no numbers in XML, you can just write them out any way you want. Multiple ways is of course preferable.
  
  3 replies →

agos 3 months ago

it's a lot of things, none of them in the browser anymore

bayindirh 3 months ago
RSS says hi!
- agos 3 months ago
  
  as much as it pains me to say it, that is also a sailed ship
  
  3 replies →