Comment by mintplant
5 months ago
Would it be possible to transform a large XML document into something on-disk that could be queried like a database by the XPath evaluator?
5 months ago
Would it be possible to transform a large XML document into something on-disk that could be queried like a database by the XPath evaluator?
Given the nature of this processing, I think even an NVMe-based disk storage would be awfully slow. (People often forget, or never realize, that the "gigabytes per second" that NVMe yields is for sequential access. Random access is quite a bit slower; still stomps spinning rust, but by much less. And this is going to be a random access sort of job, so we're in the "several multiples slower than RAM" regime of access.) This sort of thing really wants RAM, and even then, RAM with an eye towards cache coherency and other such performance considerations.
You'd basically be building an index into each node.
There's some fast databases that store prefix trees, which might be suitable for such a task actually (something like infinitydb). But building this database will basically take a while (it will require parsing the entire document). But i suppose if reading/querying is going to happen many times, its worth it?
It seems to me one could replace each text node with the offset. Perhaps limit it to longer instances?
Like MarkLogic?