Comment by niftich
8 years ago
Your comment doesn't apply for this particular case, because the submission goes into great detail that the parser in question was written with Ragel, a parser generator. The code written by them in Ragel contained a bug, which lay uncaught and dormant for years, and manifested only when calling/wrapping code was altered.
It still seems like a gross mismatch of power though. Correct me if I'm wrong but Ragel only can output parsers for regular languages, yes? You can't call their Ragel code an HTML parser because Ragel can't output a parser powerful enough to parse HTML.
HTML isn't a CFG. The HTML spec is setup as a state machine ( = regular language) + a number of side data structures like the stack of open elements and list of active formatting elements. This maps very easily to Ragel, where your actions can easily have side-effects and reference internal state within the language.
> HTML isn't a CFG. The HTML spec is setup as a state machine ( = regular language) + a number of side data structures like the stack of open elements and list of active formatting elements.
That's...that's what a context-free grammar is.
(FWIW, wild-type html might not be context-free but require a higher powered parser.)
2 replies →