Comment by JonChesterfield

2 years ago

It turns out XSLT is a trivial programming model (pattern match on the tree) with crazy aesthetics and a near-useless stdlib, which shipped a MVP called 1.0 and then got basically abandoned for json. It's a bit of a disaster of history really. There are newer and saner specifications out there which I am totally ignoring. A bit too much of the world is written in java but you can get xsltproc which is a tiny C program built on libxml2 and run that commandline style. As far as esoteric language go it's great.

I'm doing dubious things involving representing ASTs in XML which I'll probably post to github. Taking an example from that in the meantime. The setup is roughly a bunch of calls to xsltproc to turn one XML representation into another:

    xsltproc --output thing.list.xml tree_to_list.xsl thing.tree.xml

I'm liking having a schema for the before and the after forms and generally having a much better time where each individual transform doesn't do very much. The general pattern is recognise the bits you're interested in and copy everything else through unchanged, and view a given .xsl transform as a XML->XML function in weird syntax. Google will find you an "identity transform" which looks like nonsense, a lot of functions can start by copying that and then adding a match for something you care about.

One of my data representations is a tree where the information of interest is all in the leaves. That gets turned into a flattened list, then that list gets turned into a text file, then something else goes "oh that text file has C in it, awesome". Tree flattening looks like a reasonable thing to copy&paste in here, thus:

    <?xml version="1.0" encoding="UTF-8"?>
    <!-- name some extensions xsltproc knows about -->
    <xsl:transform version="1.0"
                   xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                   xmlns:str="http://exslt.org/strings"
                   xmlns:ext="http://exslt.org/common"
                   extension-element-prefixes="str ext"
                   >
      <xsl:output method="xml" indent="yes"/>
    
      <!-- Change the root node from tree to list -->
      <xsl:template match="/TokenTree">
        <TokenList>
          <xsl:apply-templates select="node()|@*"/>    
        </TokenList>
      </xsl:template>
    
      <!-- Copy attributes to the output unchanged -->
      <xsl:template match="@*">
        <xsl:copy>
          <xsl:apply-templates select="@*"/>
        </xsl:copy>
      </xsl:template>
    
      <!-- Transform elements on the way through -->
      <xsl:template match="node()">
        <!-- If there are no attributes, it's part
             of the tree structure we're flattening,
             throw away the element and keep the contents -->
        <xsl:if test="not(@*)">
          <xsl:apply-templates select="node()|@*"/>
        </xsl:if>
    
        <!-- If it does have attributes, leave it alone -->
        <xsl:if test="@*">
          <xsl:copy>
            <xsl:apply-templates select="node()|@*"/>
          </xsl:copy>
        </xsl:if>
      </xsl:template>    
    </xsl:transform>

So yeah. The syntax is not wonderful. You write xsl: a lot, or at least your editor does. The documentation on this stuff all seems to be a bit java themed and the prevailing attitude seems to be that XML is an ugly thing from the before times. Though see also https://www.defmacro.org/ramblings/lisp.html. I'm working by trial and error instead of documentation but that's going well enough. The "oh, that's a tree? Have a declarative DSL for functional transforms to other trees" is a really compelling example of wondrous magic hidden behind insane syntax.

(for a nice bonus effect, emacs is really clear on what editing tree structured documents means, and there's a "relax-ng" schema which you can use to find errors in the XML and to have emacs tell you lots of stuff about the document as you type it)

1 comment

JonChesterfield

vidarh 2 years ago

I see the appeal of that - I used to toy with a language whose default representation was XML (with two way translation from/to text as well as a diagram based editor), but XSL is way to verbose a syntax for me to interface with what is a very simple core you can build out as a library to write the same kind of tree rewrites.

Today I'd pick that option over actually using XSL anywhere - to me the only redeeming feature of XSL itself is/was the built-in support for applying XSL to XML in browsers (I worked on a web app ~2006 where XML was translated to HTML using XSL on the frontend, and you could turn off the server-side transformation and let the browser do it instead, which meant your source view was the underlying XML, which was very handy for debugging).