Comment by pbhjpbhj
5 days ago
That sounds like a problem for the embedding, would you need to renormalise so that low signal inputs could be well represented. A white square and a red square shouldn't be different levels of details. Depending on the purpose of the vector embedding, there should be a difference between images of mostly white pixels and partial images.
Disclaimer, I don't know shit.
I should clarify that I experienced these issues with text-embedding-ada-002 and the Azure AI vision model (based on Florence). I have not tested many other embedding models to see if they'd have the same issue.
FWIW I think you're right, we have very different stacks, and I've observed the same thing, with a much clunkier description thank your elegant way of putting it.
I do embeddings on arbitrary websites at runtime, and had a persistent problem with the last chunk of a web page matching more things. In retrospect, its obvious that the smaller the chunk was, the more it was matching everything
Full details: MSMARCO MiniLM L6V3 inferenced using ONNX on iOS/web/android/macos/windows/linux
You could also work around this by adding a scaling transformation that normalizes and centers (e.g. sklearn StandardScaler) in between the raw embeddings — based on some example data points from your data set. Might introduce some bias, but I’ve found this helpful in some cases with off the shelf embeddings.
Use horrible quality embeddings and get horrible results. No surprise there. ada is obsolete - I would never want to use it.