Comment by kburman
2 days ago
I recently learned that semantic search embeddings mostly represent topics and concepts, but they don’t handle negation or emotion very well.
For example, if you search for “paintings of winter landscapes but without sun and trees,” you’ll still get results with trees. That’s because embeddings capture the presence of concepts like “tree” or “landscape,” but not logical relationships like “without” or “not.”
Similarly, embeddings aren’t great at capturing how something feels. They can tell that “sad poem” and “happy poem” are different mainly because of the words used, not because they truly understand emotional tone.
This happens because most embedding models (like OpenAI’s or sentence-transformers) are trained to group things by semantic similarity, not logical meaning or sentiment. Negation, polarity, and affect aren’t explicitly represented in the vector space.
Might be common knowledge to some, but it was a cool TIL moment for me, realizing that embeddings are great at what something is about, but not how it feels or what it excludes.
Thats actually not correct. Embeddings can handle relationships like “without” or “not.” when trained for it. You need to scale up the training massively to make it generalize it well. The current version of Mixedbread Search supports negatives like "tshirt without stripes". You can check it out on our launch video [1]. We are working on a way more generalized model, which should be able to capture relationships, emotions and much more. The current models are just limited.
[1]: https://www.mixedbread.com/blog/mixedbread-search
I was referring specifically to popular embedding models like OpenAI’s and sentence-transformers, which (as far as I know) don’t reliably handle negation or emotional nuance, they mostly capture topical similarity.
I don’t know enough of the underlying math to say for sure whether embeddings can be trained to consistently represent negation, but when I tried the Mixedbread demo myself with a query like “winter landscapes without sun and trees”, it still showed me paintings with both sun and trees. So at least in its current form, it doesn’t seem to fully handle those semantic relationships yet.