Comment by CiaranMcNulty
12 hours ago
It's sad how the bloat of '00s enterprise XML made the tech seem outdated and drove everyone to 'cleaner' JSON, because things like XSLT and XPath were very mature and solved a lot of the problems we still struggle with in other formats.
I'm probably guilty of some of the bad practice: I have fond memories of (ab)using XSLT includes back in the day with PHP stream wrappers to have stuff like `<xsl:include href="mycorp://invoice/1234">`
This may be out-of-date bias but I'm still a little uneasy letting the browser do the locally, just because it used to be a minefield of incompatibility
It's been 84 years but I still miss some of the "basics" of XML in JSON - a proper standards organization, for one. But things like schemas were (or, felt like) so much better defined in XML land, and it took nearly a decade for JSON land to catch up.
Last thing I really did with XML was a technology called EXI, a transfer method that converted an XML document into a compressed binary data stream. Because translating a data structure to ASCII, compressing it, sending it over HTTP etc and doing the same thing in reverse is a bit silly. At this point protobuf and co are more popular, but imagine if XML stayed around. It's all compatible standards working with each other (in my idealized mind), whereas there's a hard barrier between e.g. protobuf/grpc and JSON APIs. Possibly for the better?
I just leaned about EXI as it's being used on a project I work on. It's quite amazingly fast and small! It is a binary representation of the xml stream. It can compress quite small if you have an xmlschema to go with your xml.
I was curious about how it is implemented and I found the spec easy to read and quite elegant: https://www.w3.org/TR/exi/
That data transform thing xslt could do was so cool. You could twist it into emitting just about any other format and XML was the top layer. You want it in tab delimited yaml. Feed it the right style sheet and there you go. Other system wants CSV. Sure thing different style sheet and there you go.
For a transport tech XML was OK. Just wasted 20% of your bandwidth on being a text encoding. Plus wrapping your head around those style sheets was a mind twister. Not surprised people despise it. As it has the ability to be wickedly complex for no real reason.
84 years? nope.
The game Rimworld stores all its game configuration data in XML and uses XPath for modding and it's so incredibly good. It's a seriously underrated combination for enabling relatively stable local modifications of data. I don't know of any other game that does this, probably because XML has a reputation of being "obsolete" or whatever. But it's just such a robust system for this use case.
https://rimworldwiki.com/wiki/Modding_Tutorials/PatchOperati...
XML is fine. A bit wordy, but I appreciate its precision and expressiveness compared to YAML.
XPath is kind of fine. It's hard to remember all the syntax but I can usually get there with a bit of experimentation.
XSLT is absolutely insane nonsense and needs to die in a fire.
It depends what you use it for. I worked on a interbank messaging platform that normalised everything into a series of standard xml formats, and then used xslt for representing data to the client. Common use case - we could rerender data to what a receiver’s risk system were expecting in config (not compiled code). You could have people trained in xslt doing that, they did not need to be more experienced developers. Fixes were fast. It was good for this. Another time i worked on a production pipeline for a publisher of education books. Again, data stored in normalised xml. Xslt is well suited to mangling in that scenario.
That's funny, I would reverse those. I loved XSLT though it took me a long time for it to click; it was my gateway drug to concepts like functional programming and idempotency. XPath is pretty great too. The problem was XML, but it isn't inherent to it -- it empowered (for good and bad) lots of people who had never heard of data normalization to publish data and some of it was good but, like Irish Alzheimer's, we only remember the bad ones.
> bloat of '00s enterprise XML
True, and it's even more sad that XML was originally just intended as a simplified subset of SGML (HTML's meta syntax with tag inference and other shortforms) for delivery of markup on the web and to evolve markup vocabularies and capabilities of browsers (of which only SVG and MathML made it). But when the web hype took over, W3C (MS) came up with SOAP, WS-this and WS-that, and a number of programming languages based on XML including XSLT (don't tell HNers it was originally Scheme but absolutely had to be XML just like JavaScript had to be named after Java; such was the madness).
Xpath would have been nice if you didnt have to pedantically namespace every bit of every query
That… has nothing to do with xpath?
If your document has namespaces, xpath has to reflect that. You can either tank it or explicitly ignore namespaces by foregoing the shorthands and checking `local-name()`.
Ok. Perhaps 'namespace the query' wasnt quite the right way of explaining it. All I'm saying is, whenever I've used xpath, instead of it looking nice like
/*bookstore/*book/*title
its been some godawful mess like
/*[name()='bookstore']/*[name()='book']/*[name()='title']
... I guess because they couldn't bear to have it just match on tags as they are in the file and it had to be tethered to some namespace stuff that most people dont bother with. A lot of XML is ad-hoc without a namespace defined anywhere
Its like
Me: Hello Xpath, heres an XML document, please find all the bookstore/book/title tags
Xpath: *gasps* Sir, I couldn't possibly look for those tags unless you tell me which namespace we are in. Are you some sort of deviant?
Me: oh ffs *googles xpath name() syntax*
3 replies →
In the 2003 The Art of Unix Programming, the author advocated bespoke text formats and writing parsers for them. Writing xml by hand is his list of war crimes. Since then syntax highlighting and autocomplete and autoformatting narrowed the effort gap and tolerant parsers (browsers being the main example) got a bad rap. Would Markdown and Yaml exist with modern editors?
However, XML is actually a worse format to transfer over the internet. It's bloated and consumes more bandwidth.
XML is a great format for what it’s intended for.
XML is a markup language system. You typically have a document, and various parts of it can be marked up with metadata, to an arbitrary degree.
JSON is a data format. You typically have a fixed schema and things are located within it at known positions.
Both of these have use-cases where they are better than the other. For something like a web page, you want a markup language that you progressively render by stepping through the byte stream. For something like a config file, you want a data format where you can look up specific keys.
Generally speaking, if you’re thinking about parsing something by streaming its contents and reacting to what you see, that’s the kind of application where XML fits. But if you’re thinking about parsing something by loading it into memory and looking up keys, then that’s the kind of application where JSON fits.
Check out EXI. It compresses the xml stream into a binary encoding and is quite small and fast:
https://www.w3.org/TR/exi/
Only if you never use compression.