Comment by ocdtrekkie

15 days ago

The challenge is the indexing. There's probably something to be said for making a personal search engine that just ingests sources you yourself are likely to want and need, but antitrust enforcers have gone as far as considering forcing Google to share it's index with competitors to give them a chance, because the data moat is so big. Even Bing, which is likely behind large chunk of third party searches, is likely primarily sourcing its data from settings in Edge that spy on your Google searches.

Yeah, feels like I increasingly just want an index of

* stackexchange

* github

* the top 10,000,000 blogs

* major recipe sites

* english Wikipedia

* abstracts of all scientific papers

* very much don't want Reddit

Everything else I'll probably punt to ChatGPT

Maybe I can ping Common Crawl and just index that and call it good.

  • I find the last one ironic because many people prefer to search Reddit over the web for answers to things. I remember there was a time using site:reddit.com on every query was popular before Reddit started selling search access.

    • ChatGPT has replaced that for me mostly. Not worth the noise/gamified junk in the Reddit results.