Comment by BbzzbB
3 years ago
That'd explain some of the holes mentioned in these comments. I think you just want to match any "word" containing ".[valid TLD]" and then exclude invalid URLs ("@" in first part indicating email, etc).
I've been using this[0] Python library which seemed good enough for my needs in some scraping project.
No comments yet
Contribute on Hacker News ↗