Comment by riedel

5 months ago

I think from a grammar side, XPath had made some decisions that make it really hard to generally implement it efficiently. About 10 years ago I was looking into binary XML systems and compiling stuff down for embedded systems realizing that it is really hard to e.g. create efficient transducers (in/out pushdown automata) for XSLT due to complexity of XPath.

Streaming is defined in the XSLT 3 spec: https://www.w3.org/TR/xslt-30/#streamability. When you want to use streaming, you are confined to a subset of XPath that is "guaranteed streamable", e.g. you can't just freely navigate the tree anymore. There are some special instructions in XSLT such as <xsl:merge> and <xsl:accumulator> that make it easier to collect your results.

Saxon's paid edition supports it. I've done it a few times, but you have to write your XSLT in a completely different way to make it work.

  • This was as I remember under development at the time. However, if working on bounded memory on communication buffers it, without remembering all the details, it was a pain not because of XSLT but mostly its interactions with XPATH. I was at the time formally looking into hedge grammars and visibly pushdown automata as formal basis, but to me it seemed at the time, that formal complexity was unnecessarily pushed beyond what was straightforward feasible. As I said it was about transforming binary (intermediate) representations of XML. Use case was actually to build message middlewares/routers for IoT stuff at the time. IMHO also the picked binary XML standards where mostly the wrong choice for small embedded systems (interestingly MPEG 7 is btw one of the few standards that supports rather nice streaming binary XML. I think however it is only used in digital broadcasting)