Comment by djhn
18 days ago
Counterpoint: Try to use an LLM for even the most coarse of visual similarity tasks for something that’s extremely abundant in the corpus.
For instance, say you are a woman with a lookalike celebrity, someone who is a very close match in hair colour, facial structure, skin tone and body proportions. You would like to browse outfits worn by other celebrities (presumably put together by professional stylists) that look exactly like her. You ask an LLM to list celebrities that look like celebrity X, to then look up outfit inspiration.
No matter how long the list, no matter how detailed the prompt in the features that must be matched, no matter how many rounds you do, the results will be completely unusable, because broad language dominates more specific language in the corpus.
The LLM cannot adequately model these facets, because language is in practice too imprecise, as currently used by people.
To dissect just one such facet, the LLM response will list dozens of people who may share a broad category (red hair), with complete disregard to the exact shade of red, whether or not the hair is dyed and whether or not it is indeed natural hair or a wig.
The number of listicles clustering these actresses together as redheads will dominate anything with more specific qualifiers, like ’strawberry blonde’ (which in general counts as red hair), ’undyed hair’ (which in fact tends to increase the proportion of dyed hair results, because that’s how linguistic vector similarity works sometimes) and ’natural’ (which again seems to translate into ’the most natural looking unnatural’, because that’s how language tends to be used).
No comments yet
Contribute on Hacker News ↗