Comment by vessenes
5 months ago
This, thirty years later, is the best pitch for XML I’ve read. Essentially, it’s a slow moving, standards-based approach to data interoperability.
I hated it the minute I learned about it, because it missed something I knew I cared about, but didn’t have a word for in the 90s - developer ergonomics. XML sucks shit for someone who wants to think tersely and code by hand. Seriously, I hate it with a fiery passion.
Happily to my mind the economics of easier-for-creators -> make web browsers and rendering engines either just DEAL with weird HTML, or else force people to use terse data specs like JSON won out. And we have a better and more interesting internet because of it.
However, I’m old enough now to appreciate there is a place for very long-standing standards in the data and data transformation space, and if the XML folks want to pick up that banner, I’m for it. I guess another way to say it is that XML has always seemed to be a data standard which is intended to be what computers prefer, not people. I’m old enough to welcome both, finally.
> XML has always seemed to be a data standard which is intended to be what computers prefer, not people.
On one hand, you aren't wrong: XML has in fact been used for machine-to-machine communication mostly. OTOH, XML was just introduced as a subset of SGML doing away with the need of vocabulary-specific markup declarations for mere parsing in favor of always requiring explicit start- and end-element tags. Whereas HTML is chock full of SGMLisms such as tag inference (for example inferring paragraph ends on block elements), empty ("self-closing") elements and enumerated ("boolean") attributes driven by per-element declarations.
One can argue to death whether the web should work as a mere document delivery network with rigid markup a la XML, or that browsers should also directly support SGML authoring idioms such as the above shortform mechanisms. SGML also has text macros/shared fragments (entities) and even allows defining own parsing tokens for markdown, math, CSV, or custom syntaxes. HTML leans towards SGML in that its documentation portrays HTML as an authoring language, but browsers are lacking even in basic SGML features such as entities.
That’s a flame war that’s been raging for decades for sure.
I do wonder what web application markup would look like today if designed from scratch. It is kind of amazing that HTML and CSS can be used for creating beautiful documents viewable on pretty much any device with a screen AND also for creating dynamic applications with pixel-perfect rendering, special effects, integrations with the device’s hardware, and even external peripherals.
If there was ever scope creep in a project this would be it. And given the recent discussion on here of curses based interfaces it reminded me just how primitive other GUI application layout tools can be while still achieving amazing results. Even something like GTK does not need the intense level of layout engine support and yet is somehow considered richer in some ways and probably more performant for a lot of stuff that’s done with it.
So I am curious what web application development would look like today if it wasn’t for HTML being “good enough”.
Had we had better process isolation in the mid-90s, I assume web application development would mostly be Java apps, with a mini-vm for each one (sort of a qubes like environment).
We just couldn't keeps apps' hands out of the cookie jar back then.
3 replies →
If a browser was designed from scratch today it wouldn't have a markup language, documents would be PDF and everything else would be Javascript to canvas.
Suggesting something like HTML would have you laughed out of the room.
1 reply →
"This, thirty years later, is the best pitch for XML I’ve read."
I wish someone would write "XML - The Good Parts".
Others might argue that this is JSON but I'd disagree:
- No comments is a non-starter
- No proper integers
- No date format
- Schema validation is a primitive toy compared what we had for XML
- Lack of allowed trailing commas
YAML ain't better. I hated whitespace handling in XML, it's a miracle how YAML could make it even worse.
XML is from era long past and I certainly don't want to go back there, but it had its good parts and I feel we have not really learned a lot from its mistakes.
In the end maybe it is just that developer ergonomics is largely a matter of taste and no language will ever please everyone.
It's funny to hear people in the comments here talk about XML in the past tense.
I know it's passé in the web dev world, but in my work we still work with XML all the time. We even have work in our queue to add support for new data sources built on XML (specifically QIF https://qifstandards.org/).
It's fine with me... I've come to like XML. It's nice to have a standard, easy way to do seschemas, validators, processors, queries, etc. It can be overdone and it's not for every use case, but it's pretty good at what it does.
I've come to think that XML will be with us for decades and probably follow us when we leave the small blue planet.
In my military work, I've heard the senior project managers refer to a modern battleship as a floating XML document.
> I know it's passé in the web dev world...
That is because the web dev world is unfortunately obsessed with the current thing. They chase trends like their lives depend on it.
Developer ergonomics is drastically underappreciated, even in modern times. Since we're talking about textual data formats, I'll go out on a limb here and say that I hate YAML. Double checking exactly how many spaces are present on each line is tedious. It manages to make a simple task like copy-pasting something from a different file (at a different indentation level) into an error-prone process. I'll take angle brackets any day.
You haven’t felt hate until you’ve counted spaces in your Helm templates in order to know what value to put after `nindent`. The punchline is that k8s doesn’t even speak yaml, the protocol is all json and it’s the tooling that inflicts yaml on us. I can live with yaml as a config format, but once logic starts creeping in, give me anything else.
Working with large YAML documents is incredibly annoying and shows the benefit of closing tags.
It all went downhill after we stopped using .ini files
1 reply →
JSON5 is a real sweet spot for me. Closing brackets, but I don't have to type every tag twice. Comments and trailing commas.
I find for deeply hierarchical data that XML is much easier to read.
4 replies →
> Developer ergonomics is drastically underappreciated, even in modern times.
When was the last time you had an editor that wouldn't just auto close the current tag with "</" ? I mean it's a god-send for knowing where you are at in large structure. You aren't scrolling to the top to find which tag you are in.
>XML has always seemed to be a data standard which is intended to be what computers prefer, not people
Interesting take, but I'm always a little hesitant to accept any anthropomorphizing of computer systems.
Isn't it always about what we can reason and extrapolate about what the computer is doing? Obviously computers have no preference so it seems like you're really saying
"XML is a poor abstraction for what it's trying to accomplish" or something like that.
Before jQuery, chrome, and web 2.0, I was building xslt driven web pages that transformed XML in an early nosql doc store into html and it worked quite beautifully and allowed us to skip a lot of schema work that we definitely were ready or knowledgeable enough to do.
EDIT: It was the perfect abstraction and tool for that job. However the application was very niche and I've never found a person or team who did anything similar (and never had the opportunity to do anything similar myself again)
I did this for many years at a couple different companies. As you said it worked very well especially at the time (early 2000’s). It was a great way to separate application logic from presentation logic especially for anything web based. Seems like a trivial idea now but at the time I loved it.
In fact the RSS reader I built still uses XSLT to transform the output to HTML as it’s just the easiest way to do so (and can now be done directly in the browser).
Re xslt based web applications - a team at my employer did the same circa 2004. It worked beautifully except for one issue: inefficiency. The qps that the app could serve was laughable because each page request went through the xslt engine more than once. No amount of tuning could fix this design flaw, and the project was killed.
Names withheld to protect the guilty. :)
Most every request goes through xslt in our team's corporate app. The other app teams are jealous of our performance.
> developer ergonomics
That was a huge reason JSON took over.
Another reason was the overall XML ecosystem grew unwieldy and difficult to navigate: XPath, XSLT, SOAP, WSDL, Xpointer, XLink, SOAP, XForms... They all made sense in their own way, but it was difficult to master them all. That complexity, plus the poor ergonomics, is what paved the way for JSON to become preferred.
I quite liked it when it first came out, I'd been dealing with a ton of bespoke formats up until then. Pretty much every one was ambiguous and painful to deal with. It was a step forward being able to push people towards a standard for document transfer.
I suspect it was SOAP and WSDL that killed it for a lot of people though. That was a typical example of a technical solution looking for a problem and complete overkill for most people.
The whole namespace thing was probably a step too far as well.
You should try using a LISP like Racket for XML. Because XML can be expressed directly as S-expressions, XML and LISP go together like peanut butter and jelly.
In my experience, at least with Clojure, it's much more convenient to serialize XML into a map-like structure. With your example, the data structure would look like so.
Some people use namespaced keywords (e.g. :xml/tag) to help disambiguate keys in the map. This kind of data structure tends to be more convenient than dealing with plain sexps or so-called "Hiccup syntax". i.e.
The above syntax is convenient to write, but it's tedious to manipulate. For instance, one needs to dispatch on types to determine whether an element at some index is an attribute map or a child. By using the former data structure, one simply looks up the :attrs or :content key. Additionally, the map structure is easier to depth-first search; it's a one-liner with the tree-seq function.
I've written a rudimentary EPUB parser in Clojure and found it easier to work with zippers than any other data structure to e.g. look for <rootfile> elements with a <container> ancestor.
Zippers are available in most programming languages, thankfully, so this advantage is not really unique to Clojure (or another Lisp). However, I will agree that something like sexps (or Hiccup) is more convenient than e.g. JSX, since you are dealing with the native syntax of the language rather than introducing a compilation step and non-standard syntax.
I have not looked into the use of zippers for this purpose, but I will do so!
Racket has helper libraries like TxExpr (https://docs.racket-lang.org/txexpr/index.html) that make it pretty easy to manipulate S-expressions of this kind.
This looks like it loses the distinction between attributes and nested tags?
As in, I don't see a difference between `(attr "val")` which expresses an attribute key/value pair and `(thing "world")` which expresses a tag/content relationship. Even if I thought the rule might be "if the first element of the list is a list itself then it should be interpreted as a set of attribute key value pairs" then I would still be ambiguous with:
which could serialize to either:
or:
In fact, this ambiguity between attributes and children has always been one of the head scratching things for me about XML. Well, the thing I've always disliked the most is namespaces but that is another matter.
There's no ambiguity. The first element is a symbol that's the name of a tag. If the second element is a list of two element symbol + string lists, it's the attributes. If it's one of the other recognized types, it's part of the contents of the tag.
See a grammar for the representation at https://docs.racket-lang.org/xml/index.html#%28def._%28%28li...
Most Scheme tools for working with XML use a different layout where a list starting with the symbol @ indicates attributes. See https://en.wikipedia.org/wiki/SXML for it.
1 reply →
> In fact, this ambiguity between attributes and children has always been one of the head scratching things for me about XML. Well, the thing I've always disliked the most is namespaces but that is another matter.
Just remember that it's a markup language, and then it's not head-scratching at all: the text is the text being marked up, and the attribute values are the attribute of the markup - things like colour and font.
When it was co-opted to store structured data, those people didn't obey this rule (which would make everything attributes).
Namespaces had a very cool use in XHTML: you could just embed an SVG or MathML directly in your HTML and the browser would render it. This feature was copied into HTML5.
2 replies →
a lisp... like dsssl ? ;-)
I used to do a lot of XSLT coding, by hand, in text editors that weren't proper IDEs, and frankly it wasn't very hard to do.
There's something very zen-like with this language; you put a document in a kind of sieve and out comes a "better" document. It cannot fail; it can be wrong, full of errors, of course (although if you're validating the result against a schema it cannot be very wrong); but it will almost never explode in your face.
And then XSLT work kind of disappeared; I miss it a lot.
I'm gonna be honest, I find terseness to be highly overrated by programmers. I value it in moderation, but for a lot of people they say things like "this language is verbose" like that is a problem unto itself. If verbosity is gaining you something (generally clarity), then I think that's a reasonable cost to pay. Terseness is not, in my opinion, a goal unto itself (though many programmers certainly treat it as such). It's something you should seek only to the extent that it makes a language easier to use.
And not only does the XML format have bad developer ergonomics, most XML parsers are equally terrible to use. There are many things I like about XML: name spaces, schemas, XPath, to some degree even XSLT. But the typical XML developer experience is terrible on every layer
XML is a big improvement over YAML.
There, I said it.
YAML is great. For simple configuration files. For anything more complex it gets gnarly quick, but honestly? If I need a config file for a script I'm writing I will reach for YAML every time. It really is amazing for that use case.
I find yaml tolerable for cases where ini would have been just as good. Anything else, and… no, it’s bad.
CSV encoded in EBCDIC is an improvement over YAML. God what an awful format...
> XML sucks shit for someone who wants to think tersely and code by hand. Seriously, I hate it with a fiery passion.
At the risk of glibly missing the main point of your comment, take a look at KDL. Unlike JSON/TOML/YAML, it features XML-style node semantics. Unlike XML, it's intended to be human-readable and writeable by hand. It has specifications for both a query language and a schema language as well as implementations in a bunch of languages. https://kdl.dev/
[dead]
The main thing I hate about XML (apart from the tedious syntax and terrible APIs - who thought SAX was a sane idea?) is that the data model is wrong for 99% of use cases.
XML gives you an object soup where text objects can be anywhere and data can be randomly stored in tags or attributes.
It just doesn't at all match the object model used by basically all programming languages.
I think that's a big reason JSON is so successful. It's literally the object model used by JavaScript. There's no weird impedance mismatch between the data represented on disk and in your program.
Then someone had to go and screw things up with YAML...
JSON5 is the way.