Comment by jeffbee
5 months ago
What alternatives exist for extracting structured data from the web? I have several ETL pipelines that use htmltidy to turn tag soup into something approximately valid and xmlstarlet to transform it into tabular data.
No comments yet
Contribute on Hacker News ↗