Comment by pyuser583

17 hours ago

There is a fascinating alternative universe where XML standards actually took hold. I've seen it in bits and pieces. It would have been beautiful.

But that universe did not happen.

Lots of "modern" tooling works around the need. For example, in a world of Docker and Kubernetes, are those standards really that important?

I would blame the adoption of containerization for the lack of interest in XML standards, but by the time containerization happened, XML had been all but abandoned.

Maybe it was the adoption of Python, whose JSON libraries are much nicer than XML. Maybe it was the fact that so few XML specs every became mainstream.

In terms of effort, there is a huge tail in XML, where you're trying to get things working, but getting little in return for that effort. XLST is supposed to be the glue that keeps it all together, but there is no "it" to keep together.

XML also does not play very nice with streaming technologies.

I suspect that eventually XML will make a comeback. Or maybe another SGML dialect. But that time is not now.

52 comments

pyuser583

Aurornis 17 hours ago

I think the simplest explanation is that developers used it and did not like it.

The pro-XML narrative always sounded like what you wrote, as far back as I can remember: The XML people would tell you it was beautiful and perfect and better than everything as long as everyone would just do everything perfectly right at every step. Then you got into the real world and it was frustrating to deal with on every level. The realities of real-world development meant that the picture-perfect XML universe we were promised wasn't practical.

I don't understand your comparison to containerization. That feels like apples and oragnes.

mikepurvis 16 hours ago
HTML was conceived as a language for marking up a document that was primarily text; XML took the tags and attributes from that and tried to turn it into a data serialization and exchange format. But it was never really well suited to that, and it's obvious from looking XML-RPC or SOAP payloads that there were fundamental gaps in the ability of XML to encode type and structure information inline:
<?xml version="1.0"?> <methodCall> <methodName>math.add</methodName> <params> <param> <value><int>5</int></value> </param> <param> <value><int>7</int></value> </param> </params> </methodCall>
Compared to this, JSON had string and number types built in:
{ "jsonrpc": "2.0", "method": "math.add", "params": [5, 7], "id": 1 }
I don't think this is the only factor, but I think XML had a lot of this kind of cognitive overhead built in, and that gave it a lot of friction when stacked up against JSON and later yaml... and when it came to communicating with a SPA, it was hard to compete with JS being able to natively eval the payload responses.
- pyuser583 9 hours ago
  
  Yeah this is the issue. I spent tons of time writing code that would consume xml and turn it into something useful.
  It’s a mediocre data storage language.
- UlisesAC4 15 hours ago
  
  To be fair I cannot trust your shape in your jsonrpc, I am not sure if id is truly an integer or if you sent me an integer by mistake, same as params or even the payload of the params' param, this is why we ended adopting openapi for describing http interactions and iirc jsonrpc specifically can also be described with it. At least in the schema part no one would say it is ambiguous, also one does not need do heavier parses, the obj is a tree, no more checking on scaping strings, no more issues with handcoded multiline strings, it is dropped the need to separate attributes with commas as we know the end tag delimits a space and so on.
smarx007 16 hours ago
> developers used it and did not like it.
This makes sense.
However, there are two ways to address it:
1) Work towards a more advanced system that addresses the issues (for example, RDF/Turtle – expands XML namespaces to define classes and properties, represents graphs instead of being limited to trees unlike XML and JSON)
2) Throw it away and start from scratch. First, JSON. Then, JSON schema. Jq introduces a kind of "JSONPath". JSONL says hi to XML stream readers. JSONC because comments in config files are useful. And many more primitives that existed around XML were eventually reimplemented.
Note how the discussion around removing XSLT 1 support similarly has two ways forward: yank it out or support XSLT 3.
I lean towards Turtle replacing XML over JSON, and for XSLT 3 to replace XSLT 1 support in the browsers.
- mpyne 16 hours ago
  
  > And many more primitives that existed around XML were eventually reimplemented.
  Don't miss that they were reimplemented properly.
  Even XML schemas, the one thing you'd think they were great at, ended up seeing several different implementation beyond the original DTD-based schema definitions and beyond XSD.
  Some XML things were absolute tire fires that should have been reimplemented even earlier, like XML-DSIG, SAML, SOAP, WS-everything.
  It's not surprising devs ended up not liking it, there are actual issues trying to apply XML outside of its strengths. As with networking and the eventual conceit of "smart endpoints, dumb pipes" over ESBs, not all data formats are better off being "smart". Oftentimes the complexity of the business logic is better off in the application layer where you can use a real programming language.
  
  1 reply →
themafia 16 hours ago
The simplest explanation is that attributes were a mistake. They add another layer to the structure and create confusion as to where data is best stored within it.
XML without attributes probably would have seen wide and ready adoption.
- pyuser583 9 hours ago
  
  I see it as the opposite. Attributes weren’t used enough. The result was unnecessarily nested code.
  “Keep things flat” is current good advice in terms of usability. That means favor attributes over children.
  
  1 reply →
bawolff 15 hours ago

While i think a lot of xml was a bad idea, some of the issues are not instrinsically the fault of XML but some really poor design decisions by people making xml based languages.
They tended to be design by comittee messes that included every possible use case as an option.
Anyone who has ever had the misfortune of having to deal with SAML knows what i'm talking about. Its a billion line long specification, everyone only implements 10% of it, and its full of hidden gotchas that will screw up your security if you get them wrong. (Even worse, the underlying xml-signature spec is literally the worst way to do digital signatures possible. Its so bad you'd think someone was intentionally sabotaging it)
In theory this isn't xml's fault, but somehow XML seems to attract really bad spec designers.
pyuser583 9 hours ago

Regarding containerization, XML wouldn’t just be a noun, but a verb (like in XSLT). You would define your remote procedures in XML.
Imagine if instead of the current Dockerfile format, we used XML, which was dynamically generated from lists of packages, and filtered and updates according to RSS feeds describing CSVs and package updates.
I’m not saying this is anything other than strange fantasy. And not a particularly nice fantasy either.
XML failed because it forced devs to spend tons of unproductive time on it
mattmanser 16 hours ago
Part of the problem was it came in an era before we really understood programming, as a collective. We didn't even really know how to encapsulate objects properly, and you saw it in poor database schema designs, bizarre object inheritance patterns, poorly organised APIs, even the inconsistent method param orders in PHP. It was everywhere. Developers weren't good at laying out even POCOs.
And those bizarre designs went straight into XML, properties often in attributes, nodes that should have been attributes, over nesting, etc.
And we blamed XML for the mess where often it was just inexperience in software design as an industry that was the real cause. But XML had too much flexibility compared to the simplicity of the later JSON, meaning it helped cause the problem. JSON 'solved' the problem by being simpler.
But then the flip side was that it was too strict and starting one in code was a tedious pita where you had to specify a schema even though it didn't exist or even matter most of the time.
- toyg 16 hours ago
  
  Nah, we still have all those issues and more.
  The hard truth is that XML lost to the javascript-native format (JSON). Any JavaScript-native format would have won, because "the web" effectively became the world of JavaScript. XML was not js-friendly enough: the parsing infrastructure was largely based on C/C++/Java, and then you'd get back objects with verbose interfaces (again, a c++/java thing) rather than the simple, nested dictionaries that less-skilled "JS-first" developers felt at ease with.
  
  4 replies →
- Aurornis 16 hours ago
  
  This is the abstract idealism I was talking about: Every pro-XML person I've talked to wants to discuss XML in the context of a hypothetical perfect world of programming that does not exist, not the world we inhabit.
  The few staunch XML supporters I worked with always wanted to divert blame to something else, refusing to acknowledge that maybe XML was the wrong tool for the job or even contributing to the problems.

IshKebab 23 minutes ago

XML standards did "take hold". You couldn't get away from XML for a while. Then everyone came to their senses, thank god.

the_mitsuhiko 17 hours ago

> I've seen it in bits and pieces. It would have been beautiful.

XHTML being based on XML tried to be a strict standard in a world where a non-strict standard existed and everybody became just very much aware on a daily that a non-strict standard is much easier to work with.

I think it's very hard to compete with that.

kstrauser 16 hours ago
Seconded. I spend a whole lot of effort making my early 2000s websites emit compliant XHTML because it seemed like the right thing to do, and the way we were inevitably heading. And then I — and apparently almost everyone else, too, at the same time — realized it was a whole lot of busywork with almost nothing to show for it. The only thing XHTML ever added to the mix was a giant error message if you forget to write "<br/>" instead of "<br>".
Know what? Life's too short to lose time to remembering to close a self-closing tag.
About the time XHTML 1.1 came along, we collectively bailed and went to HTML5, and it was a breath of fresh air.
- ndriscoll 15 hours ago
  
  I don't understand this sentiment. Never have. I've doubted myself for years that I'm mistaken that this is how XHTML really fell out of favor despite it being what I recall reading at the time. A modern web developer is writing in javascript or typescript, which is going to make you correctly close your curly braces and parentheses (and so much more with typescript).
  Then React introduced faux-XML as JSX except with this huge machinery of a runtime javascript virtual DOM instead of basic template expansion and everyone loves it? And if this react playground I've opened up reflects reality, JSX seems to literally require you to balance opening/closing your tags. The punch-line of the whole joke.
  What was the point of this exercise? Why do people use JSX for e.g. blogs when HTML templating is built into the browser and they do nothing dynamic? For many years it's been hard to shake the feeling that it isn't some trick to justify 6 figure salaries for people making web pages that are simple enough that an 8 year old should be up to the task.
  That same nagging feeling reassures me about our AI future though. Easy ways to do things have been here the whole time, yet here we are. I don't think companies are as focused on efficiency as they pretend. Clearly social aspects like empire building dominate.
  
  3 replies →

johannes1234321 16 hours ago

I think a key factor is: XML offers so many ways to serialize, you always have to decide in the individual case what the structure should be, what's an attribute, what's Text content, what's it's own attribute and those are important choices having impact on later changes.

With JSON you can dump data structures from about any language straight out and it's okay to start toying around and experimenting. Over time you might add logic for filtering out some fields, rename others, move stuff a little around without too much trouble.

Also quickly writing the structure up by hand works a lot faster in any editor, without having to repeat closing tags (while at some point closing brackets and braces will take their tribute)

However I agree: once you got the XML machinery, there is a lot of power in it.

jackero 16 hours ago

I legitimately tried my best to like XLST on the web back in the day.

The idea behind XLST is nice — creating a stylesheet to transform raw data into presentation. The practice of using it was terrible. It was ugly, it was verbose, it was painful, it had gotchas, it made it easier for scrapers, it bound your data to your presentation more, and so on.

Most of the time I needed to generate XML to later apply a XLST style sheet, the resulting XML document was mostly a one off with no associated spec and not a serious transport document. It begged the question of why I was doing this extra work.

ndriscoll 15 hours ago

Making your data easy to scrape is part of the point (or just more generally work with). If you're building your web presence, you want people to easily be able to find the data on your site (unless your goal is platform lockin).
The entire point of XSLT is to separate your data from its presentation. That's why it made it easy to scrape. You could return your data in a more natural domain model and transform it via a stylesheet to its presentation.
And in doing so it is incredibly concise (mostly because XPath is so powerful).

hnlmorg 17 hours ago

The problem with XML is that it’s horrible to manually read and write, and it takes more effort to parse too. It’s a massive spec which contains footguns (a fair few CVEs exist just because of people using XML on publicly accessible endpoints).

Now I do think there is a need for the complexity supported by XML to exist, but 99% of the time JSON or similar is good enough while being easy to work with.

That all said, XHTML was amazing. I’d have loved to see XHTML become the standard for web markup. But alas that wasn’t to be.

spankalee 16 hours ago
XHTML was too rigid - as a user agent it should try to render a document, rather than tell the user: "tough, the developer screwed up".
So XHTML lost to the much more forgiving HTML.
There was an idea to make a forgiving XML for web use cases: https://annevankesteren.nl/2007/10/xml5 but it never got traction.
- hnlmorg 15 hours ago
  
  I saw the rigidity of XHTML as an asset rather than a problem.
  But I do agree that I’m likely in the minority of people (outside of web developers at least) that thought that way.

SoftTalker 17 hours ago

XML was a great idea but the markup was tedious and verbose.

JSON is too simplistic.

Something built from s-expressions would probably have been ideal but we've known that for 70 years.

pyuser583 9 hours ago
What would that look like?
- spauldo 4 hours ago
  
  Look up "SXML" for an example of what that might look like. There's many other ways to do it, though.

assimpleaspossi 17 hours ago

Sometimes it gets lost that XML is a document description language like HTML.

IshKebab 4 hours ago

Yeah this 100%. JSON took off only partly because it has less tedious syntax. The other big reason is that its data model is actually what you really want 99% of the time. XML has to be squashed awkwardly into the normal object-based data model that most programming languages use.
somat 16 hours ago
I actually rather like XML(asterisk) But this is one of it's warts, It wants to be two things, A markup language, it's in the name and arguably where it should have stayed, and an object notation. This is where you start to question some of XML's fundamentals, stuff like why is it redundant? When do you stick data in attributes? Or is it better to nest tags all the way down?
Asterisk: except namespaces, I loathe those, you are skipping happily along chewing through your XML, xpathing left and right, and then find out some psychopath has decided to use namespaces, and now every thing has become super awkward and formal.
- layer8 16 hours ago
  
  Namespaces are essential whenever you want to insert contents defined by one schema into “payload” or “application-defined” elements of another schema. There are also more complex scenarios where attributes from one schema are used to annotate elements from a different schema.
  Well, I guess we could do it like libraries in C-land and have every schema add its own informal identifier prefix to avoid name collisions. But there’s a reason why programming languages moved to namespaces as an explicit notion.

th0ma5 17 hours ago

I think you're getting at a very often discussed ebb and flow between being extremely controlled vs extremely flexible. XML was astounding compared to system specific proprietary systems, and then as the need for formalism grew people wanted something simpler... And now you see the same thing growing with JSON and the need for more rigor. I personally think there are many forces to all of this, just the context at the time, prevailing senses of which things are chores and which aren't, companies trying to gain advantage, but probably most importantly is that the vast majority of people have a subset of historical information about systems and computer science, myself included, yet we have to get things done.

WJW 16 hours ago

> I would blame the adoption of containerization for the lack of interest in XML standards, but by the time containerization happened, XML had been all but abandoned.

It got abandoned because it sucks. New technology gets adopted because it's good. XML standard were just super meh and difficult to work with. There's really not much more to it than that.

madeofpalk 16 hours ago

I don’t follow why docker killed XML.

warkdarrior 17 hours ago

> I would blame the adoption of containerization for the lack of interest in XML standards, but by the time containerization happened, XML had been all but abandoned.

Not sure how that is true. XML is a specification for a data format, but you still need to define the schema (i.e., elements, attributes, their meaning). It's not like XML for web pages (XHTML?) could also serve as XML for Linux container descriptions or as XML for Android app manifests.

Devasta 17 hours ago

If XForms was released on browsers today, it would be hailed as a revolutionary technology. Instead, it is just one of the many things thrown away, and even now 20 years after the WHATWG took over we cannot even do a PUT request without Javascript.

What a pity.

ndriscoll 15 hours ago

For those unaware of XForms: https://www.youtube.com/watch?v=2yYY7GJAbOo

SigmundA 17 hours ago

>XML also does not play very nice with streaming technologies.

Not sure why just as good as JSON, if you are going to stream and parse you need a low level push or pull parser not a DOM just like JSON. See SAX for Java or XmlReader / XmlWriter in .Net.

XSLT 3 even had a streaming mode I believe which was badly needed but had constraints due to not having the whole document in memory at once.

I liked XSLT but there is no need for it, javascript is good enough if not better, many times you needed to do a xslt script tag to get some thing done it couldn't do on its own anyway, might as well use a full language with good libraries for handling XML instead. See Linq to XML etc.

somat 16 hours ago
Right, both sort of suck at streaming, something about being closed form tree structures would be my guess(strictly speaking, you need to close the tree to serialize it, so no clean way to append data in real time, best you can do is to leave the structure open and send fragments). Having said that, I am not really sure what a good native streaming format would look like. Best guess is something flatter, closer to CSV.
- SigmundA 15 hours ago
  
  >Right, both sort of suck at streaming, something about being closed form tree structures would be my guess(strictly speaking, you need to close the tree to serialize it, so no clean way to append data in real time, best you can do is to leave the structure open and send fragments).
  Again don't really agree, its just most developers don't seem to understand the difference between a DOM or parsing JSON into a full object vs using a streaming reader or writer so they need to be hand fed a format that forces it on them such as line based CSV.
  Maybe if JSON and XML allowed top level multiple documents / objects it would have helped like JSON lines.

echelon 17 hours ago

Google didn't want XML to win.

XHTML would have made the Semantic Web (capital letters) possible. Someone else could have done search better. We might have had a proper P2P web.

They wanted sloppy, because only Google scale could deal with that.

Hopefully the AI era might erode that.

bawolff 14 hours ago

The Semantic Web was never going to win. It does not make sense on a fundamental level.
XHTML's failure had nothing to do with it, and is basically unrelated. Even if xhtml won i fail to see how that would have helped semantic web in any way shape or form.
WJW 16 hours ago

XML died because it sucks. Everyone who had to deal with it back in the day jumped to YAML and/or JSON as quickly as they could. Google didn't cause that, but because they're a search engine they followed it.
hnlmorg 17 hours ago

I don’t recall Google being the ones to kill XHTML. Got any references to back that claim up?