← Back to context

Comment by bobosha

6 days ago

This is where the memory bit comes in, if you have a memory of past embeddings and associated label(s), it could be an ANN query to fetch the most similar embeddings and infer therefrom.

But an embedding is more like a one way hash, kind of like sha1 or md5, no? You can get from input data to a hash value but not the other way around, right? I know that similarly placed embedding vectors will sit next to semantically related vectors but these clusters could be really sparse in such a massively dimensional hyperspace and so the nearest values in a cache may be too far away to be useful?

BTW I'm very much not an expert here and I'm just trying to understand how this system works end to end. Don't take anything I write here as authoritative.