← Back to context

Comment by romanhn

4 days ago

Say I generate embeddings for a bunch of articles. Given the query "articles about San Francisco that don't mention cars" would cosine similarity uprank or downrank the car mentions? Assuming exclusions aren't handled well, what techniques might I use to support them?

It is up for testing, but you likely get the effect of "don't think about a pink elephant." So I guess that for most embedding models, "articles about San Francisco that don't mention cars" are closest to articles about SF that mention cars.

The fundamental issue here is comparing apples to oranges, questions, and answers.

I think you have to separate it into negative query and run (negative) rank and combine results yourself.