← Back to context

Comment by unknownaccount

4 years ago

I don’t know why you are framing this as an impossible task. It doesn’t need to be on the scale of Bing/Google to function. There are already some self-hosted search engine solutions that work okay. Just filter out all the trash sites with low quality content like Facebook/Twitter from the database and that 300TB common crawl could probably be cut down to a more reasonable 200TB. Filter out non-English results and it probably halves it further. I’m seeing 8TB drives on Newegg for $129. It absolutely does not take anywhere on the order of “days” to query a properly optimized db of this size.