Comment by isaacfung

2 years ago

https://twitter.com/eugeneyan/status/1678060204943097863

>When Deepmind needs semantic retrieval, they just use the largest index on the planet.

Fun fact: Query-doc similarity was done via simple TF-IDF instead of vectors. It performed better than vector retrieval when retrieve docs > 45 (they used 50).

https://blog.vespa.ai/improving-zero-shot-ranking-with-vespa...

>This case illustrates that in-domain effectiveness does not necessarily transfer to an out-of-domain zero-shot application of the model. Generally, as observed on the BEIR dense leaderboard, dense embeddings models trained on NQ labels underperform the BM25 baseline across almost all BEIR datasets.