Comment by ashvardanian

5 months ago

Lucene is tough to deal with. About 15 hours ago — right when this comment was posted — I was giving a talk at Databricks comparing the world’s most widely used search engines. I’ve never run into as many issues with any other similar tool as I did with Lucene. To be fair, it’s been around for ~26 years and has aged remarkably well... but it’s the last thing I’d choose today.

Can I ask you which alternatives exist at the layer Lucene occupies?

I went looking around last year and couldn’t really find many options, but I might have been looking in the wrong places.

  • For Vector Search the top 2 are: Meta’s FAISS and (my) Unum’s USearch. Lucene powers Elastic, Solr, MongoDB Atlas, AWS OpenSearch, Azure Cognitive Search. USearch powers ClickHouse, DuckDB, YugaByte, TiDB, ScyllaDB, MemGraph, KuzuDB, Lantern, and a few big closed source names that don’t mention it, as far as I know. FAISS has the highest usage among Python developers, but if you are indexing large collections you should consider alternatives.

Interesting, then, that Vectroid would choose to fork it.

Elasticsearch is at least good / at hiding the Lucene zoo under the hood.