← Back to context

Comment by TeMPOraL

4 days ago

Yes and no. Embeddings can be used in both directions - if you can find images closest to some entries in a search text, you can also identify tokens or phrases closest in space to any image or cluster of images, and output that. It's a problem long solved in many different ways, including but not limited to e.g.:

https://github.com/pythongosssss/ComfyUI-WD14-Tagger

which uses specific models to generate proper booru tags out of any image you pass to it.

More importantly, I know for sure they have this capability in practice, because if you tap the right way in the right app, when the Moon is in just the right phase, both Samsung Gallery and OneDrive Photos does (or in case of OneDrive, used to):

- Provide occasional completions and suggestions for predefined categories, like "sunset" or "outwear" or "people", etc.;

- Auto-tag photos with some subset of those (OneDrive, which also sometimes records it in metadata), or if you use "edit tag" options, suggest best fitting tags (Samsung);

- Have a semi-random list of "Things" to choose from to categorize your photos, such as "Sunsets", "City", "Outdoors", "Room", etc. Google Photos does that one too.

This shows they do maintain a list of correct and recommended classifications. They just choose to keep it hidden.

With regards to face recognition, it's even worse. There's zero controls and zero information other than occasionally matched (and often mismatched) face under photo properties, that you can sometimes delete.