← Back to context

Comment by throwaway81523

7 hours ago

Gad, they sure like to say "BM25" over and over again. That's a near worthless approach to result ranking. Doing any halfway ok job requires much more tuned and/or more powerful approaches.

It's common to do a hybrid of BM25 with other fuzzy search or pgvector.

  • BM25 is quite bad and needs to be retrained for each corpus anew. SPLADEv2 is much better and there are even better sparse embeddings these days.