Comment by airstrike

7 months ago

What problems are {elegantly, neatly, best} solved by using XPath and XSLT today that would make them reasonable choices over alternatives?

21 comments

airstrike

jerf 7 months ago

XPath is a very nice language for querying over XML. Most places pitch it as a "declarative" syntax, but as I am quite skeptical of "declarative" as a concept, you can also look at the vast majority of the XPath standard as a way to imperatively drive a multicursor over an XML document, diving in out and out nodes and extracting bits of text and such, without having to write the equivalent code in your language to do so, which will be inevitably quite a bit more verbose. When you need it, it's really useful.

In my very opinionated opinion, XPath is about 99% of the value of XSLT, and XSLT itself is a misfire. Embedding an XML language in XML, rather than being an amazing value proposition, is actually a huge and really annoying mistake, in much the same way and for much the same reason as anyone who has spent much time around shell scripting has found trying to embed shell strings in shell strings (and, if the situation is particularly dire, another third or fourth level of such nesting) is quite unpleasant. Imagine trying to deal with bash, except you have to first quote all the command lines as bash strings like you're using bash -c, all the time. I think "XPath + your favorite language" has all the power of XSLT and, generally, better ergonomics and comprehensibility. Once you've got the selection of nodes in hand, a general-purpose programming language is a better way to deal with their contents then what XSLT provides. Hence why it has always languished.

int_19h 7 months ago

XQuery is the best of both worlds - you get almost all the benefits of XSLT like e.g. the ability to define your own functions, but with non-XML-based syntax that is a superset of XPath.
Basically the only thing it's missing in XQuery vs XSLT is template rules and their application; but IMO simple ones are just as easy to write explicitly, and complex rulesets are hard to reason about and maintain anyway.
jrpelkonen 7 months ago

It’s been a while since I’ve had to deal with XML, but I remember finding it fairly convenient to restructure XML documents with XSLT. Modifying the data in those documents, much less so. I think there’s a sweet spot.
akshayshah 7 months ago

To someone who hasn’t worked much with XML, this seems like a reasonable take!
For cases where a host system wants to execute user-defined data transformations safely, XSLT seems like it might be useful. When they mature, maybe WASM and WASI will fill the same niche with better developer ergonomics?
therealmarv 7 months ago

Interesting take about XSLT. But I agree... XSLT could be something much more simple (and non XML initself) and combined with XPATH. It feels like a lot of boiler code to write XSLT.

password4321 7 months ago

XPATH+XSLT is SQL for XML, declarative selection and transformation.

Using an XML library to iterate through an entire XML document without XPATH is like looping through entire database tables without a JOIN filter or a WHERE clause.

XSLT is the SELECT, transforming XML output with a new level of crazy for recursion.

mickeyp 7 months ago

XPath is a superb query language for XML (or anything that you can structure as a DOM) --- it is also, with some obscure exceptions, the only query language with serious adoption, so it's an easy choice and readily available in XML tools. The only caveat is there are various spec versions and most never added support for newer versions.

Let's look at JSON by comparison. Hmm, let's see: JSONPath, JMESPath, jq, jsonql, ...

never_inline 7 months ago
JQ is the most feature-rich of the bunch. It's defacto standard and I usually just default to it because it offers so much - assignment, various builtins such as base64 encoding.
The disadvantage is that it's not easily embeddable in your own programs - so programs use JSONPath / Go templates often.
- bbkane 7 months ago
  
  I also don't think there's a specification written for the jq query language, unlike https://jmespath.org/ , which as you mentioned also has more client libraries.
  I too am probably going to embed jmespath in my app.I need it to allow users to fill CLI flags from config files, and it'll replace my crappy homegrown version ( https://github.com/bbkane/warg/blob/740663eeeb5e87c9225fb627... )
  
  3 replies →
- BoingBoomTschak 7 months ago
  
  And it's yet another terrible DSL that you must learn when it could have been a language everybody already knows, like Python. The query part isn't even that well done, compared to XPath/JSONPath.
  I said goodbye to it a few weeks ago, personally (https://world-playground-deceit.net/blog/2025/03/a-common-li... https://world-playground-deceit.net/blog/2025/03/speeding-up...)
  
  2 replies →
trallnag 7 months ago

Recently discovered Jsonata thanks to AWS adding it to Step Functions. Feel free to add it to your enumeration

Devasta 7 months ago

I manage a team who build and maintain trading data reports for brokers, we have everything generate in a fairly standard format and customize to those brokers exact needs with XSLT. Hundreds of reports, couldnt manage without it.

therealmarv 7 months ago

E.g. massive XML documents with complexity which you need to be transformed into other structured XML. Or if you need to parse complex XML. Some people hate XSLT, XPATH with a passion and would rather write much more complex lxml code. It has a steep learning curve but once you understand the fundamentals you can transform XML more easily and especially predictable and reliable than ever.

Another example: If you have very large XML you cannot fit even into memory you can still stream process them with XSLT.

It makes you the master of XML transformations and fetching information out of complex XML ;)

jeffbee 7 months ago

What alternatives exist for extracting structured data from the web? I have several ETL pipelines that use htmltidy to turn tag soup into something approximately valid and xmlstarlet to transform it into tabular data.

never_inline 7 months ago

I have used it when using scraping some data from web pages using scrapy framework. It's reliable way to extract something from web pages compared to regex.

mdaniel 7 months ago
don't overlook the ability to mix and match them, because each "axis" is good at its own things
response.xpath("//div[string-contains(@data-foo, "foo")").css(".some-class").re(r"[a-z][a-zA-Z]+")
The .css() flavor gets complied down into .xpath() but there is no contest about their expressivity: https://github.com/scrapy/parsel/blob/v1.9.1/parsel/csstrans...