Comment by moebrowne

15 hours ago

Isn't this what CommonCrawl are doing?

https://commoncrawl.org/

Yes. But they don't crawl everything (probably due to lack of funding), and, as the article and other commenters here note, people are incentivised to allow Google and only Google to crawl. In practice, the CommonCrawl dataset is too small for a realistic search engine competitor.

I'd love to see Google, Bing and others being incentivized (wink, wink) to contribute (technically, financially, etc) to CommonCrawl or Internet Archive since they already do this.