Comment by throwaway81523
7 hours ago
Gad, they sure like to say "BM25" over and over again. That's a near worthless approach to result ranking. Doing any halfway ok job requires much more tuned and/or more powerful approaches.
7 hours ago
Gad, they sure like to say "BM25" over and over again. That's a near worthless approach to result ranking. Doing any halfway ok job requires much more tuned and/or more powerful approaches.
Can you please elaborate why?
It's common to do a hybrid of BM25 with other fuzzy search or pgvector.
BM25 is quite bad and needs to be retrained for each corpus anew. SPLADEv2 is much better and there are even better sparse embeddings these days.