Comment by hrmtst93837
6 hours ago
If you want a parser that actually checks the XML spec and various edge cases, then parsing goes from human-readable config to O(n^2) string handling. The funny part is how often people silently accept partial or broken XML in prod because revisiting schema validation years later is a nightmare. If you want cheap parsing, you end up writing a regex or DOM walker and hoping for the best, which raises the question of why not just use JSON or invent a different DSL to start.
(Properly formatted) XML can be parsed, and streamed, by a visibly-pushdown automaton[1][2].
"Visibly Pushdown Expressions"[3] can simplify parsing with a terse syntax styled after regular expressions, and there's an extension to SQL which can query XML documents using VPAs[4].
JSON can also be parsed and validated with visibly pushdown automata. There's an interesting project[5] which aims to automatically produce a VPA from a JSON-schema to validate documents.
In theory these should be able outperform parsers based on deterministic pushdown automata (ie, (LA)LR parsers), but they're less widely used and understood, as they're much newer than the conventional parsing techniques and absent from the popular literature (Dragon Book, EAC etc).
[1]:https://madhu.cs.illinois.edu/www07.pdf
[2]:https://www.cis.upenn.edu/~alur/Cav14.pdf
[4]:https://web.cs.ucla.edu/~zaniolo/papers/002_R13.pdf
[3]:https://homes.cs.aau.dk/~srba/courses/MCS-07/vpe.pdf
[5]:https://www.gaetanstaquet.com/ValidatingJSONDocumentsWithLea...
Without looking, I guessed that all your quotes come from academic papers. I was right.
Because real life is nothing like what is taught in CS classes.
I'm not an academic and have extensive experience with parsing.
But for whataver reason, VPAs have slipped under my radar until very recently - I only discovered them a few weeks ago and have been quite fascinated. Have been reading a lot (the citations I've given are some of my recent reading), and am currently working on a visibly pushdown parser generator. I'm more interested in the practical use than the acamedic side, but there's little resources besides academic papers for me to go off.
Thought it might be interesting to share in case others like me have missed out on VPAs.