← Back to context

Comment by SubiculumCode

4 days ago

The article is basically saying: if the feature vectors are crypticly encoded, then cosine similarity tells you little.

Cosin similarity of two encrypted images would be useless, unencrypt them, a bit more useful.

The 'strings are not the territory' in other words, the territory is the semantic constructs cryptically encoded into those strings. You want the similarity of constructs, not strings.

I can't see these in this article, at all.

I think what it say is under "Is it the right kind of similarity?" :

> Consider books. > For a literary critic, similarity might mean sharing thematic elements. For a librarian, it's about genre classification. > For a reader, it's about emotions it evokes. For a typesetter, it's page count and format. > Each perspective is valid, yet cosine similarity smashes all these nuanced views into a single number — with confidence and an illusion of objectivity.