← Back to context

Comment by egonschiele

14 years ago

I use HXT to parse HTML. AFAICT, Hexpat doesn't do much besides parse the XML file into a tree. It doesn't have the niceties that Nokogiri or BeautifulSoup do. For example, I can use Nokogiri to get all the links on a page like so: page.css("a").

HXT allows me to come close to this:

tree >>> getXPathTreesInDoc "//a"

But I haven't seen a single Haskell XML parsing library that is as nice as Nokogiri.

In my work, I read in XML, parse its elements, attributes, and data, producing new XML. Along with Parsec, Hexpat is well-suited to the task.

I haven't had to parse HTML in Haskell. I use BeautifulSoup for that. I wouldn't be surprised if the Haskell libraries aren't as useful for that kind of thing.