Comment by egonschiele
14 years ago
I use HXT to parse HTML. AFAICT, Hexpat doesn't do much besides parse the XML file into a tree. It doesn't have the niceties that Nokogiri or BeautifulSoup do. For example, I can use Nokogiri to get all the links on a page like so: page.css("a").
HXT allows me to come close to this:
tree >>> getXPathTreesInDoc "//a"
But I haven't seen a single Haskell XML parsing library that is as nice as Nokogiri.
In my work, I read in XML, parse its elements, attributes, and data, producing new XML. Along with Parsec, Hexpat is well-suited to the task.
I haven't had to parse HTML in Haskell. I use BeautifulSoup for that. I wouldn't be surprised if the Haskell libraries aren't as useful for that kind of thing.
I wrote up a guide to working with HTML in HXT: http://adit.io/posts/2012-04-14-working_with_HTML_in_haskell...
You might find it handy if you decide to give HXT another go :)