Comment by simonw
1 year ago
It turns out picking that threshold is extremely difficult - I've tried! The value seems to differ for different searches, so picking eg 0.7 as a fixed value isn't actually as useful as you would expect.
1 year ago
It turns out picking that threshold is extremely difficult - I've tried! The value seems to differ for different searches, so picking eg 0.7 as a fixed value isn't actually as useful as you would expect.
Agreed that thresholds don't work when applied to the cosine similarity of embeddings. But I have found that the similarity score returned by high-quality rerankers, especially Cohere, are consistent and meaningful enough that using a threshold works well there.
I use similarity threshold (to remove absolutely irrelevant results) and then use a reranker to get Top N.