← Back to context

Comment by badmintonbaseba

9 hours ago

I have worked for a company that (probably still is) heavily invested in XSLT for XML templating. It's not good, and they would probably migrate from it if they could.

  1. Even though there are newer XSLT standards, XSLT 1.0 is still dominant. It is quite limited and weird compared to the newer standards.

  2. Resolving performance problems of XSLT templates is hell. XSLT is a Turing-complete functional-style language, with performance very much abstracted away. There are XSLT templates that worked fine for most documents, but then one document came in with a ~100 row table and it blew up. Turns out that the template that processed the table is O(N^2) or worse, without any obvious way to optimize it (it might even have an XPath on each row that itself is O(N) or worse). I don't exactly know how it manifested, but as I recall the document was processed by XSLT for more than 7 minutes.

JS might have other problems, but not being able to resolve algorithmic complexity issues is not one of them.

XSLt is not easy. It’s prologue on shrooms so to speak and it has a steep learning curve. Once mastered gives sudoku level satisfaction, but can hardly ever be a standard approach to built or templating as normally people need much less to achieve goals.

Besides XML is not universally loved.

It's generally speaking part of the problem with the entire "XML as a savior" mindset of that earlier era and a big reason of why we left them, doesn't matter if XSLT or SOAP or even XHTML in a way ... Those were defined as machine language meant for machine talking to machine, and invariably something go south and it's not really made for us to intervene in the middle; it can be done but it's way more work than it should be; especially since they clearly never based it on the idea that those machine will sometime speak "wrong", or a different "dialect".

It looks great, then you design your stuff and it goes great, then you deploy to the real world and everything catches on fire instantly and everytime you stop one another one starts.

  • It was very odd that a simple markup language was somehow seen as the savior for all computing problems.

    Markup languages are a fine and useful and powerful way for modeling documents, as in narrative documents with structure meant for human consumption.

    XML never had much to recommend it as the general purpose format for modeling all structured data, including data meant primarily for machines to produce and consume.

  • > It's generally speaking part of the problem with the entire "XML as a savior" mindset of that earlier era and a big reason of why we left them

    Generally speaking I feel like this is true for a lot of stuff in programming circles, XML included.

    New technology appears, some people play around with it. Others come up with using it for something else. Give it some time, and eventually people start putting it everywhere. Soon "X is not for Y" blogposts appear, and usage finally starts to decrease as people rediscover "use the right tool for the right problem". Wait yet some more time, and a new technology appears, and the same cycle begins again.

    Seen it with so many things by now that I think "we'll" (the software community) forever be stuck in this cycle and the only way to win is to explicitly jump out of the cycle and watch it from afar, pick up the pieces that actually make sense to continue using and ignore the rest.

    • A controversial opinion, but JSON is that too. Not as bad as XML was (̶t̶h̶e̶r̶e̶'̶s̶ ̶n̶o̶ ̶"̶J̶S̶L̶T̶"̶)̶, but wasting cycles to manifest structured data in an unstructured textual format has massive overhead on the source and destination sides. It only took off because "JavaScript everywhere" was taking off — performance be damned. Protobufs and other binary formats already existed, but JSON was appealing because it's easily inspectable (it's plaintext) and easy to use — `JSON.stringify` and `JSON.parse` were already there.

      We eventually said, "what if we made databases based on JSON" and then came MongoDB. Worse performance than a relational database, but who cares! It's JSON! People have mostly moved away from document databases, but that's because they realized it was a bad idea for the majority of usecases.

      8 replies →

    • There have been many such cycles, but the XML hysteria of the 00s is the worst I can think of. It lasted a long time and the square peg XML was shoved into so many round holes.

      4 replies →

  • > part of the problem with the entire "XML as a savior" mindset of that earlier era

    I think part of the problem is focusing on the wrong aspect. In the case of XSLT, I'd argue its most important properties are being pure, declarative, and extensible. Those can have knock-on effects, like enabling parallel processing, untrusted input, static analysis, etc. The fact it's written in XML is less important.

    Its biggest competitor is JS, which might have nicer syntax but it loses those core features of being pure and declarative (we can implement pure/declarative things inside JS if we like, but requiring a JS interpreter at all is bad news for parallelism, security, static analysis, etc.).

    When fashions change (e.g. XML giving way to JS, and JSON), we can end up throwing out good ideas (like a standard way to declare pure data transformations).

    (Of course, there's another layer to this, since XML itself was a more fashionable alternative to S-expressions; and XSLT is sort of like Lisp macros. Everything old is new again...)

  • Those were defined as machine language meant for machine talking to machine

    i don't believe this is true. machine language doesn't need the kind of verbosity that xml provides. sgml/html/xml were designed to allow humans to produce machine readable data. so they were meant for humans to talk to machines and vice versa.

    • Yes, I think the main difference is having imperative vs declarative computation. With declarative computation, the performance of your code is dependent on the performance and expressiveness of the declarative layer, such as XML/XSLT. XSLT lacks the expressiveness to get around its own performance limitations.

XSLT/XPath have evolved since XSLT 1.0.

Features are now available like key (index) to greatly speedup the processing. Good XSLT implementation like Saxon definitively helps as well on the perf aspect.

When it comes to transform XML to something else, XSLT is quite handy by structuring the logic.

  • Keys were a thing in XSLT 1.x already.

    XSLT 2+ was more about side effects.

    I never really grokked later XSLT and XPath standards though.

    XSLT 1.0 had a steep learning curve, but it was elegant in a way poetry is elegant because of extra restrictions imposed on it compared to prose. You really had to stretch your mind to do useful stuff with it. Anyone remembers Muenchian grouping? It was gorgeous.

    Newer standards lost elegance and kept the ugly syntax.

    No wonder they lost mindshare.

    • "Newer standards lost elegance and kept the ugly syntax."

      My biggest problem with XSLT is that I've never encountered a problem that I wouldn't rather solve with an XPath library and literally any other general purpose programming language.

      When XSLT was the only thing with XPath you could rely on, maybe it had an edge, but once everyone has an XPath library what's left is a very quirky and restrictive language that I really don't like. And I speak Haskell, so the critic reaching for the reply button can take a pass on the "Oh you must not like functional programming" routine... no, Haskell is included in that set of "literally any other general purpose programming language" above.

      2 replies →

  • XSLT just needs a different, non-XML serialization.

    XML (the data structure) needs a non-XML serialization.

    Similar to how Semantic Web's Owl has four different serializations, only one of them being the XML serialization. (eg. Owl can be represented in Functional, Turtle, Manchester, Json, and N-triples syntaxes.)

    • > XML (the data structure) needs a non-XML serialization.

      That's YAML, and it is arguibly worse. Here's a sample YAML 1.2 document straight from their spec:

          %TAG !e! tag:example.com,2000:app/
          ---
          - !local foo
          - !!str bar
          - !e!tag%21 baz
      

      Nightmare fuel. Just by looking at it, can you tell what it does?

      --

      Some notes:

      - SemWeb also has JSON-LD serialization. It's a good compromise that fits modern tooling nicely.

      - XML is still a damn good compromise between human readable and machine readable. Not perfect, but what is perfect anyway?

      - HTML5 is now more complex than XHTML ever was (all sorts of historical caveats in this claim, I know, don't worry).

      - Markup beauty is relative, we should accept that.

It's odd cause xslt was clearly made in an era where expecting long source xml to be processed was the norm, and nested loops would blow up obviously..

> XSLT 1.0 is still dominant

How, where? In 2013 I was still working a lot with XSLT and 1.0 was completely dead everywhere one looked. Saxon was free for XSLT 2 and was excellent.

I used to do transformation of both huge documents, and large number of small documents, with zero performance problems.

  • Probably corps. I was working at Factset in the early 2000's when there was a big push for it and I imagine the same thing was reflected across every Microsoft shop across corporate America at the time, which (at the time) Microsoft was winning big marketshare in. (I bet there are still a ton of internal web apps that only work with IE... sigh)

    Obviously, that means there's a lot of legacy processes likely still using it.

    The easiest way to improve the situation seems to be to upgrade to a newer version of XSLT.

  • I recently had the occasion to work with a client that was heavily invested in XML processing for a set of integrations. They’re migrating / modernizing but they’re so heavily invested in XSL that they don’t want to migrate away from it. So I conducted some perf tests and, the performance I found for xslt in .NET (“core”) was slightly to significantly better than the performance of Java (current) and Saxon. But they were both fast.

    In the early days the xsl was all interpreted. And was slow. From ~2004 or so, all the xslt engines came to be jit compiled. XSL benchmarks used to be a thing, but rapidly declined in value from then onward because the perf differences just stopped mattering.

> Even though there are newer XSLT standards, XSLT 1.0 is still dominant.

I'm pretty sure that's because implementing XSLT 2.0 needs a proprietary library (Saxon XSLT[0]). It was certainly the case in the oughts, when I was working with XSLT (I still wake up screaming).

XSLT 1.0 was pretty much worthless. I found that I needed XSLT 2.0, to get what I wanted. I think they are up to XSLT 3.0.

[0] https://en.wikipedia.org/wiki/Saxon_XSLT

  • Are you saying it is specified that you literally cannot implement it other than on top of, or by mimicing bug-for-bug, that library (the way it was impossible to implement WebQSL without a particular version of SQLite) or is Saxon XSLT just the only existing implementation of the spec?

    • Support required support from libxml/libxsl. That tops out at 1.0. I guess you could implement your own, as it’s an open standard, but I don’t think anyone ever bothered to.

      I think the guy behind Saxon may be one of the XSLT authors.

Are you using the commercial version of Saxon? It's not expensive, and IMHO worth it for the features it supports (including the newer standards) and the performance. If I remember correctly (it was a long time ago) it does some clever optimizations.

  • We didn't use Saxon, I don't work there anymore. We also supported client-side (browser) XSLT processing, as well as server-side. It might have helped on the server side, maybe could even resolve some algorithmic complexities with some memoization (possibly trading off memory consumption).

    But in the end the core problem is XSLT, the language. Despite being a complete programming language, your options are very limited for resolving performance issues when working within the language.

    • O(n^2) issues can typically be solved using keyed lookups, but I agree that the base processing speed is slow and the language really is too obscure to provide good DX.

      I worked with a guy who knew all about complexity analysis, but was quick to assert that "n is always small". That didn't hold - but he'd left the team by the time this became apparent.

  • The final free version of Saxon is a lot faster than earlier ones too. My guess is that it compiles the XSLT in some way for the JVM to use.

From my experience, most simple websites are fine with XSLT 1.0 and don't experience any performance problems.

  • Sure, performance might never become a problem, it is relatively rare. But when it does there is very little you can do about it.

Same here.

A couple of blue chip websites I‘ve seen that could be completely taken down just by requesting the sitemap (more than once per minute).

PS: That being said it is an implantation issue. But it may speak for itself that 100% of the XSLT projects I‘ve seen had it.