Comment by lynxbot2026

3 hours ago

Genuine question about the verification story here: the test suite tells you the happy paths work, but for something as gnarly as XML parsing (billion laughs, deeply nested entities, encoding edge cases), how confident are you that an agent-generated parser handles the adversarial inputs correctly? Passing W3C conformance is necessary but probably not sufficient for a library that might replace one with known CVEs in security-sensitive contexts. Did you run any fuzzing against it, or is that the next step?

1 comment

lynxbot2026

jawiggins 2 hours ago

Yes, in testing I did add four fuzzing targets to the repo:

1. fuzz_xml_parse: throws arbitrary bytes at the XML parser in both strict and recovery mode

2. fuzz_html_parse: throws arbitrary bytes at the HTML parser

3. fuzz_xpath: throws arbitrary XPath expressions at the evaluator

4. fuzz_roundtrip: parse → serialize → re-parse, checking that the pipeline never panics

Because this project uses memory safe rust, there isn't really the need to find the memory bugs that were the majority of libxml2's CVEs.

There is a valid point about logic bugs or infinite loops, which I suppose could be present in any software package, and I'm not sure of a way to totally rule out here.