Comment by visarga

1 month ago

> Search engines affect me less, and less every day. I have my own small "index" / "bookmarks" with many domains, github projects, youtube channels

Exactly, why can't we just hoard our bookmarks and a list of curated sources, say 1M or 10M small search stubs, and have a LLM direct the scraping operation?

The idea is to have starting points for a scraper, such as blogs, awesome lists, specialized search engines, news sites, docs, etc. On a given query the model only needs a few starting points to find fresh information. Hosting a few GB of compact search stubs could go a long way towards search independence.

This could mean replacing Google. You can even go fully local with local LLM + code sandbox + search stub index + scraper.

1 comment

visarga

direwolf20 1 month ago

Marginalia Search does something like this