Comment by romanhn

6 months ago

Say I generate embeddings for a bunch of articles. Given the query "articles about San Francisco that don't mention cars" would cosine similarity uprank or downrank the car mentions? Assuming exclusions aren't handled well, what techniques might I use to support them?

4 comments

romanhn

stared 6 months ago

It is up for testing, but you likely get the effect of "don't think about a pink elephant." So I guess that for most embedding models, "articles about San Francisco that don't mention cars" are closest to articles about SF that mention cars.

The fundamental issue here is comparing apples to oranges, questions, and answers.

romanhn 6 months ago

So is LLM pre/post-processing the best approach here?

mirekrusin 6 months ago

I think you have to separate it into negative query and run (negative) rank and combine results yourself.

breadislove 6 months ago

no this wont work. embedding models at the moment are pretty bad with negations