Comment by Aurornis

20 hours ago

I think the simplest explanation is that developers used it and did not like it.

The pro-XML narrative always sounded like what you wrote, as far back as I can remember: The XML people would tell you it was beautiful and perfect and better than everything as long as everyone would just do everything perfectly right at every step. Then you got into the real world and it was frustrating to deal with on every level. The realities of real-world development meant that the picture-perfect XML universe we were promised wasn't practical.

I don't understand your comparison to containerization. That feels like apples and oragnes.

19 comments

Aurornis

mikepurvis 20 hours ago

HTML was conceived as a language for marking up a document that was primarily text; XML took the tags and attributes from that and tried to turn it into a data serialization and exchange format. But it was never really well suited to that, and it's obvious from looking XML-RPC or SOAP payloads that there were fundamental gaps in the ability of XML to encode type and structure information inline:

    <?xml version="1.0"?>
    <methodCall>
        <methodName>math.add</methodName>
        <params>
            <param>
                <value><int>5</int></value>
            </param>
            <param>
                <value><int>7</int></value>
            </param>
        </params>
    </methodCall>

Compared to this, JSON had string and number types built in:

    {
        "jsonrpc": "2.0",
        "method": "math.add",
        "params": [5, 7],
        "id": 1
    }

I don't think this is the only factor, but I think XML had a lot of this kind of cognitive overhead built in, and that gave it a lot of friction when stacked up against JSON and later yaml... and when it came to communicating with a SPA, it was hard to compete with JS being able to natively eval the payload responses.

UlisesAC4 18 hours ago
To be fair I cannot trust your shape in your jsonrpc, I am not sure if id is truly an integer or if you sent me an integer by mistake, same as params or even the payload of the params' param, this is why we ended adopting openapi for describing http interactions and iirc jsonrpc specifically can also be described with it. At least in the schema part no one would say it is ambiguous, also one does not need do heavier parses, the obj is a tree, no more checking on scaping strings, no more issues with handcoded multiline strings, it is dropped the need to separate attributes with commas as we know the end tag delimits a space and so on.
- Aurornis 2 hours ago
  
  > To be fair I cannot trust your shape in your jsonrpc, I am not sure if id is truly an integer or if you sent me an integer by mistake, same as params or even the payload of the params' param
  In practice, it doesn't matter.
  If the JSON payload is in the wrong format the server rejects it with an error.
  If the server sends an integer "by mistake" then the purists would argue that the client should come to a halt and throw up an error to the user. Meanwhile the JSON users would see an integer coming back for the id field and use it, delivering something that works with the server as it exists today. Like it or not, this is why JSON wins.
  Schema defined protocols are very useful in some circumstances, but in my experience the added overhead of keeping them in sync everywhere and across developers is a lot of overhead for most simple tasks.
  Putting the data into a simple JSON payload and sending it off gets the job done in most cases.
pyuser583 13 hours ago

Yeah this is the issue. I spent tons of time writing code that would consume xml and turn it into something useful.
It’s a mediocre data storage language.

smarx007 20 hours ago

> developers used it and did not like it.

This makes sense.

However, there are two ways to address it:

1) Work towards a more advanced system that addresses the issues (for example, RDF/Turtle – expands XML namespaces to define classes and properties, represents graphs instead of being limited to trees unlike XML and JSON)

2) Throw it away and start from scratch. First, JSON. Then, JSON schema. Jq introduces a kind of "JSONPath". JSONL says hi to XML stream readers. JSONC because comments in config files are useful. And many more primitives that existed around XML were eventually reimplemented.

Note how the discussion around removing XSLT 1 support similarly has two ways forward: yank it out or support XSLT 3.

I lean towards Turtle replacing XML over JSON, and for XSLT 3 to replace XSLT 1 support in the browsers.

mpyne 20 hours ago
> And many more primitives that existed around XML were eventually reimplemented.
Don't miss that they were reimplemented properly.
Even XML schemas, the one thing you'd think they were great at, ended up seeing several different implementation beyond the original DTD-based schema definitions and beyond XSD.
Some XML things were absolute tire fires that should have been reimplemented even earlier, like XML-DSIG, SAML, SOAP, WS-everything.
It's not surprising devs ended up not liking it, there are actual issues trying to apply XML outside of its strengths. As with networking and the eventual conceit of "smart endpoints, dumb pipes" over ESBs, not all data formats are better off being "smart". Oftentimes the complexity of the business logic is better off in the application layer where you can use a real programming language.
- smarx007 19 hours ago
  
  > Even XML schemas, the one thing you'd think they were great at
  Of course not! W3C SHACL shapes, on the other hand...
  schema.org is also a move in the right direction

themafia 20 hours ago

The simplest explanation is that attributes were a mistake. They add another layer to the structure and create confusion as to where data is best stored within it.

XML without attributes probably would have seen wide and ready adoption.

pyuser583 13 hours ago
I see it as the opposite. Attributes weren’t used enough. The result was unnecessarily nested code.
“Keep things flat” is current good advice in terms of usability. That means favor attributes over children.
- pests 12 hours ago
  
  I agree. A sibling thread showed an example of XML above containing params/param/value/int/ nodes which with attributes could just be <param type=int>.
  I do agree that attributes/data was always a huge contention point on where things should go and caused confusion and bikeshedding.
  I also saw a bit of this in the React/JSX community with decisions like render props, HoC, etc where it took a bit to stabilize on best practices.

bawolff 18 hours ago

While i think a lot of xml was a bad idea, some of the issues are not instrinsically the fault of XML but some really poor design decisions by people making xml based languages.

They tended to be design by comittee messes that included every possible use case as an option.

Anyone who has ever had the misfortune of having to deal with SAML knows what i'm talking about. Its a billion line long specification, everyone only implements 10% of it, and its full of hidden gotchas that will screw up your security if you get them wrong. (Even worse, the underlying xml-signature spec is literally the worst way to do digital signatures possible. Its so bad you'd think someone was intentionally sabotaging it)

In theory this isn't xml's fault, but somehow XML seems to attract really bad spec designers.

pyuser583 13 hours ago

Regarding containerization, XML wouldn’t just be a noun, but a verb (like in XSLT). You would define your remote procedures in XML.

Imagine if instead of the current Dockerfile format, we used XML, which was dynamically generated from lists of packages, and filtered and updates according to RSS feeds describing CSVs and package updates.

I’m not saying this is anything other than strange fantasy. And not a particularly nice fantasy either.

XML failed because it forced devs to spend tons of unproductive time on it

mattmanser 20 hours ago

Part of the problem was it came in an era before we really understood programming, as a collective. We didn't even really know how to encapsulate objects properly, and you saw it in poor database schema designs, bizarre object inheritance patterns, poorly organised APIs, even the inconsistent method param orders in PHP. It was everywhere. Developers weren't good at laying out even POCOs.

And those bizarre designs went straight into XML, properties often in attributes, nodes that should have been attributes, over nesting, etc.

And we blamed XML for the mess where often it was just inexperience in software design as an industry that was the real cause. But XML had too much flexibility compared to the simplicity of the later JSON, meaning it helped cause the problem. JSON 'solved' the problem by being simpler.

But then the flip side was that it was too strict and starting one in code was a tedious pita where you had to specify a schema even though it didn't exist or even matter most of the time.

toyg 20 hours ago
Nah, we still have all those issues and more.
The hard truth is that XML lost to the javascript-native format (JSON). Any JavaScript-native format would have won, because "the web" effectively became the world of JavaScript. XML was not js-friendly enough: the parsing infrastructure was largely based on C/C++/Java, and then you'd get back objects with verbose interfaces (again, a c++/java thing) rather than the simple, nested dictionaries that less-skilled "JS-first" developers felt at ease with.
- mpyne 20 hours ago
  
  The thing is, JSON is even superior in C++.
  It's a dumber format but that makes it a better lingua franca between all sorts of programming languages, not just Javascript, especially if you haven't locked in on a schema.
  Once you have locked in on a schema and IDL-style tooling to autogenerate adapter classes/objects, then non-JSON interchange formats become viable (if not superior). But even in that world, I'd rather have something like gRPC over XML.
- em-bee 19 hours ago
  
  that's the thing, XML should have become javascript native so that we could write inline HTML more easily like JSX from react allows us to do.
  
  2 replies →
Aurornis 20 hours ago

This is the abstract idealism I was talking about: Every pro-XML person I've talked to wants to discuss XML in the context of a hypothetical perfect world of programming that does not exist, not the world we inhabit.
The few staunch XML supporters I worked with always wanted to divert blame to something else, refusing to acknowledge that maybe XML was the wrong tool for the job or even contributing to the problems.