Comment by m-i-l

3 years ago

I still quite like the idea of having a number of independent search engines each indexing their own specialist subjects, and one or more federated search front-ends which can pull these together.

Doing it with APIs is a little tricky to make work in a usable way though. There have been various attempts at standardised APIs, e.g. OpenSearch[0], and metasearch engines like searX[1] have what are essentially pluggable scrapers, but there are still fundamental issues like getting different results at different times and having different ranking mechanisms.

Integrating at the index level could make a more usable search, but there are lots of other issues with this approach, e.g. those experienced with Apache Solr's Cross Data Centre Replication[2]. And yes, the volumes of data may also be an issue, given a search index will typically be slightly larger than the compressed data size, e.g. the 16M wikipedia docs are approx 32Gb compressed and approx 40.75Gb in a search index.

[0] https://github.com/dewitt/opensearch , unrelated to Amazon's Elasticsearch fork

[1] https://github.com/searx/searx

[2] https://solr.apache.org/guide/8_11/cross-data-center-replica...

0 comments

m-i-l

No comments yet

Contribute on Hacker News ↗