Comment by tantalor
9 months ago
> CLIP embeds the entire image as a single vector, not 170 of them.
Single token?
> GPT-4o must be using a different, more advanced strategy internally
Why
9 months ago
> CLIP embeds the entire image as a single vector, not 170 of them.
Single token?
> GPT-4o must be using a different, more advanced strategy internally
Why
The embeddings do not offer the level of fidelity to recognize fine details on an image, hand writing for example.