Comment by cwisecarver

8 years ago

This sounds to me like an object lesson in "Why you shouldn't write your own HTML parser."

Every time I see a dev trying to parse HTML with a custom solution or regex or anything other than a proven OSS library designed to parse HTML I recoil reflexively. Sure, maybe you don't need a parser to see if that strong tag is properly closed but the alternative is ...

You're right in 99+% of cases. But I suspect that the needs of cloudflare for this use case aren't typical of what's expected of an html parser. I'm not certain that there isn't an existing parser that would work for them, but I'm equally not certain that there is.

  • I can see the argument but 99+% of this audience isn't cloudflare. My comment was more directed at those who aren't. Special use-cases are all over the place. It's just making sure you're choosing because your use-case really is special and that when you re-implement something that you're doing it because it's different and better, not because you'd rather write something than integrate.

  • Even so, if the parser handles security or human safety then it shouldn't be written in C, or even using a parser generator that generates C.

    Just use ML, or Rust, or bloody JavaScript for all I care. I don't care if they add a ton to response time, or add 100% perf overhead costs for running the thing.

    Having an OS, ssl library, web server etc written in C is bad enough but at least that code has many eyes on it. Companies shouldnt throw their custom made tyres on top of that fire.