Comment by drBonkers

3 years ago

Many search-engine posts recently. When will someone make the Search Engine for Search Engines?

I do think some form of collaboration between small search engines would be very beneficial. I've been thinking about how to make that happen. So far I've added a public API to my search engine, and published some data.

Not sure what is a good way of creating a space for collaboration...

  • I created a metasearch for myself based on the idea of "continuation searches". One obvious point of collaboration bwtween search engines could be a uniform API and SERP format. Currently, there are slight variations between search engines in terms of the submission URL syntax/paramaters and the HTML used to display results, not to mention HTTP method, limits on number of results and sometimes additional, optional URL parameters. The differences are generally small^1 and this makes it relatively easy to create a personal metasearch. However it could be much less cumbersome if these differences were eliminated.

    1. Exceptions are, e.g., ones that require two HTTP requests per query, such as Gigablast or ones that have strange limitations, e.g., Startpage, which has become unusable for me without Javascript. Contacting their "customer support" yielded no response.

    Even better would be if search engines all shared their indexes and made these available for download. This would faciltate people building new search engines without needing to have their own index. In theory it would also bring a stop to the problem of people who submit large numbers of queries since all the bulk data they need would be available for download. www indexes that comprise public information could be freely shared as public data.

    • An index is, lowballing it, hundreds of gigabytes of dense binary soup; probably in some custom format specific to that search engine (sometimes there's some form of hash table going on, sometimes a B-tree), almost certainly with its own idiosyncrasies concerning keyword extraction. I think reconciling API differences is probably a lot easier than making use of index data.

      2 replies →