Comment by isaacfung
2 years ago
https://twitter.com/eugeneyan/status/1678060204943097863
>When Deepmind needs semantic retrieval, they just use the largest index on the planet.
Fun fact: Query-doc similarity was done via simple TF-IDF instead of vectors. It performed better than vector retrieval when retrieve docs > 45 (they used 50).
https://blog.vespa.ai/improving-zero-shot-ranking-with-vespa...
>This case illustrates that in-domain effectiveness does not necessarily transfer to an out-of-domain zero-shot application of the model. Generally, as observed on the BEIR dense leaderboard, dense embeddings models trained on NQ labels underperform the BM25 baseline across almost all BEIR datasets.
No comments yet
Contribute on Hacker News ↗