Comment by BbzzbB

4 years ago

That'd explain some of the holes mentioned in these comments. I think you just want to match any "word" containing ".[valid TLD]" and then exclude invalid URLs ("@" in first part indicating email, etc).

I've been using this[0] Python library which seemed good enough for my needs in some scraping project.