← Back to context

Comment by mbac32768

14 days ago

so... is it realistic to self-host a search engine yet?

what's the napkin math on this?

The challenge is the indexing. There's probably something to be said for making a personal search engine that just ingests sources you yourself are likely to want and need, but antitrust enforcers have gone as far as considering forcing Google to share it's index with competitors to give them a chance, because the data moat is so big. Even Bing, which is likely behind large chunk of third party searches, is likely primarily sourcing its data from settings in Edge that spy on your Google searches.

  • Yeah, feels like I increasingly just want an index of

    * stackexchange

    * github

    * the top 10,000,000 blogs

    * major recipe sites

    * english Wikipedia

    * abstracts of all scientific papers

    * very much don't want Reddit

    Everything else I'll probably punt to ChatGPT

    Maybe I can ping Common Crawl and just index that and call it good.

    • I find the last one ironic because many people prefer to search Reddit over the web for answers to things. I remember there was a time using site:reddit.com on every query was popular before Reddit started selling search access.

      1 reply →