← Back to context

Comment by mickeyp

5 months ago

It's interesting to see the slow rehabilitation of XML and its tooling now that there's a new generation of developers who have not grown up in the shadow of XML's prime in the late 90s / early 2000s, and who have not heard (or did not buy into) the anti-XML crowd's ranting --- even though some of their criticisms were legitimate.

I've always liked XML, and especially XPath, and even though there were a large number of missteps in the heyday of XML, I feel it has always been unfairly maligned. Look at all the people who reinvent XML tooling but for JSON, but not nearly as well. Luckily, people who value XML can still use it, provided the fit is right. But it's nice to see the tides turning.

Most fashions really are cyclical.

It’s the “slope of enlightenment” phase of the Gartner hype cycle, where people are able to make sober assessments of technologies without undue influence from hype or its backlash. We’re long past the days where XML is used for everything, even when it’s inappropriate, and we’re also past the “trough of disillusionment” phase where people sought alternatives to XML.

I think XML is good for expressing document formats and for configuration settings. I prefer JSON for data serialization, though.

I made extensive use of XPath and XSL(T) back in their heyday and in general was fine with them but the architect astronauts who love showing off how clever they are with artificial complexity had a tendency to make use of XML tech to complicate things unnecessarily. Think that might be where many people's dislike of it came from, especially those whose first exposure wasn't learning through simple structures when XML was new but were thrown into the type of morass that develops when a tech is climbing the maturity curve.

I manage a team of business analysts and accountants who use XSLT for generating reports for banks, XSLT is usually their first experience programming outside some linkedin learning courses. Not once has one of them ever complained about namespaces, or verbosity or anything like it, this is something I only see on HN or the programming subreddits.

The vast vast majority of Devs only experience of XML is what they hear second hand, I'm sure a lot more would like it if they tried it.

My complaints about XML remain pretty much unchanged since 10 years ago.

- Not including self-closing tags, there should only be one close tag: </>

- Elements are for data. Attributes are evil

- XPath indexing should be 0-based

- Documents without a schema should not make your tools panic or complain

- An xml document shouldn't have to waste it's time telling you it's an xml document in xml

I maintain that one of the reasons JSON got so popular so quickly is because it does all of the above. The problem with JSON is that you lose the benefits of having a schema to validate against.

  • Microsoft seems to be especially obsessed with making as much as possible into attributes. Makes me wonder if there is some hidden historical reason for that like an especially powerful evangelist inside the company that loved attributes during the early days of adopting XML.

    • Attributes are way shorter to write.

      That said, these days most Microsoft XML dialects are actually XAML-based, and in XAML attributes are basically syntactic sugar - you can write:

        <Foo Bar="123">
      

      or

        <Foo>
          <Foo.Bar>123</Foo.Bar>
        </Foo>
      
      

      (the dot in the syntax makes it possible for the XAML parser to distinguish nested elements that represent properties from nested elements that represent child objects)

  • > Elements are for data. Attributes are evil

    This is like, your opinion, man... ;-) You can devise your schema any way you want. Attributes are great, and they exist in HTML in the form of datasets, which, as usual, are a poorly-specified and ill-designed rethinking of XML attributes

    > Documents without a schema should not make your tools panic or complain

    They don't. You absolutely don't need a schema. If you declare a schema, it should exist. If not, no problem?

    • No, the problem with attributes is that people consistently misuse them. So many things about XML break down when you make everything a self closing tag with 50 attributes. So many programmers just seem to say, "oh, it's shorter text so it must be inherently better" or "oh it's one-to-one so I should strictly avoid anything resembling a heirarchy."

      Like I think this guy is mostly correct in identifying bad XML: https://www.devever.net/~hl/xml

      Though I don't necessarily agree with the "data format" framing. This idea that markup languages are not data formats seems confused.

      > They don't. You absolutely don't need a schema. If you declare a schema, it should exist. If not, no problem?

      I agree that they should not.

      However, I have used many tools that puke when presented with XML fragments or XML with no schema.

      1 reply →

  • There have been proposals a long time ago, including by Tim Bray, for an XML 2.0 that would remove some warts. But there was no appetite in the industry to move forward.

XML/XPath are very useful but I've definitely lived through their abuses. Still abusus non tollit usam and I've had many positive experiences with XPath especially. XmlStarlet has been especially useful, also xmllint. I welcome more tooling like this. The major downside to XML is the verbosity and cognitive load. Tooling that manages that is a godsend.

XML is still a huge mistake for most stuff. It's fine for _documents_ but not as a data storage solution. Bloat, ambiguities, virtually impossible to canonicalise.

XPath is cute, but if you don't mind bloat, text-only and lack of ergonomics, anyways then Conjunctive Regular Path Queries and RDF are miles ahead of XML as a data storage solution. (Not serialised as XML please xD)

Curiously, one of the driving forces behind renewed interest in XML is that language models seem to handle large XML documents better than JSON. I suspect this has something to do with it being more redundant - e.g. closing tags including the element name - making it easier for the model to keep track of structure.

XML, and other X[x] standards, are just horrible to read. On top of that, XML was made 10x worse by wrapping things in SOAP and the like over the wire, back in the day.

XSD, XPath, XSLT are all domains where I'd argue that reading/reasoning about are way more important.

When troubleshooting an issue, I don't mind scanning XML for a few data points so I can confirm what values are being communicated, but when I need to figure out how/why a specific value came to be, I don't want the logic spread throughout a giant text file wrapped in attribute value strings, and other non-debuggable "code". I'd rather it just be in a proper programming language.

  • The specifications are certainly not easy to read, and I wouldn't recommend them to learn about XML. But from the perspective of someone implementing them they are quite useful!

    As someone who has used many programming languages and who went through the process of implementing this one I have many opinions about XPath and XSLT as programming languages. I myself am more interested in implementing them for others who value using them than using them myself. I do recognize there is a sizeable community of people who do use these tools and are passionate about them - and that's interesting to see and more power to them!

It's only a sample of one but I'm really unhappy with the issues and limitations that JSON and YAML have, and I welcome XML if it has good tools.

  • That depends on what I'm doing. Most what what I'm doing is simple and so xml is just way to complex for the task. However when I need something complex xml can handle things that the others cannot - at the expense of being really complex to work with.