Comment by da_chicken

5 months ago

My complaints about XML remain pretty much unchanged since 10 years ago.

- Not including self-closing tags, there should only be one close tag: </>

- Elements are for data. Attributes are evil

- XPath indexing should be 0-based

- Documents without a schema should not make your tools panic or complain

- An xml document shouldn't have to waste it's time telling you it's an xml document in xml

I maintain that one of the reasons JSON got so popular so quickly is because it does all of the above. The problem with JSON is that you lose the benefits of having a schema to validate against.

Microsoft seems to be especially obsessed with making as much as possible into attributes. Makes me wonder if there is some hidden historical reason for that like an especially powerful evangelist inside the company that loved attributes during the early days of adopting XML.

  • Attributes are way shorter to write.

    That said, these days most Microsoft XML dialects are actually XAML-based, and in XAML attributes are basically syntactic sugar - you can write:

      <Foo Bar="123">
    

    or

      <Foo>
        <Foo.Bar>123</Foo.Bar>
      </Foo>
    
    

    (the dot in the syntax makes it possible for the XAML parser to distinguish nested elements that represent properties from nested elements that represent child objects)

> Elements are for data. Attributes are evil

This is like, your opinion, man... ;-) You can devise your schema any way you want. Attributes are great, and they exist in HTML in the form of datasets, which, as usual, are a poorly-specified and ill-designed rethinking of XML attributes

> Documents without a schema should not make your tools panic or complain

They don't. You absolutely don't need a schema. If you declare a schema, it should exist. If not, no problem?

  • No, the problem with attributes is that people consistently misuse them. So many things about XML break down when you make everything a self closing tag with 50 attributes. So many programmers just seem to say, "oh, it's shorter text so it must be inherently better" or "oh it's one-to-one so I should strictly avoid anything resembling a heirarchy."

    Like I think this guy is mostly correct in identifying bad XML: https://www.devever.net/~hl/xml

    Though I don't necessarily agree with the "data format" framing. This idea that markup languages are not data formats seems confused.

    > They don't. You absolutely don't need a schema. If you declare a schema, it should exist. If not, no problem?

    I agree that they should not.

    However, I have used many tools that puke when presented with XML fragments or XML with no schema.

  • Sometime attribute use goes too far such as when they contain comma separated lists of items.

    • Sure, but that's not the fault of the format itself, is it? You can write extremely long enumerations in any natural language -- that's the author's fault.

There have been proposals a long time ago, including by Tim Bray, for an XML 2.0 that would remove some warts. But there was no appetite in the industry to move forward.