Comment by senko
3 hours ago
Yes. But they don't crawl everything (probably due to lack of funding), and, as the article and other commenters here note, people are incentivised to allow Google and only Google to crawl. In practice, the CommonCrawl dataset is too small for a realistic search engine competitor.
I'd love to see Google, Bing and others being incentivized (wink, wink) to contribute (technically, financially, etc) to CommonCrawl or Internet Archive since they already do this.
No comments yet
Contribute on Hacker News ↗