Comment by yazaddaruvala
2 days ago
At least in theory. If the model is the same, the embeddings can be reused by the model rather than recomputing them.
I believe this is what they mean.
In practice, how fast will the model change (including tokenizer)? how fast will the vector db be fully backfilled to match the model version?
That would be the “cache hit rate” of sorts and how much it helps likely depends on some of those variables for your specific corpus and query volumes.
> the embeddings can be reused by the model
I can't find any evidence that this is possible with Gemini or any other LLM provider.
Yeah given what your saying is true and continues to be,
Seems the embeddings would just be useful for a “nice corpus search” mechanism for some regular RAG.
This can’t be what they mean. Even if this was somehow possible, Embeddings lose information and are not reversible, I.e embeddings do not magically compress actual text into a vector in a way that a model can implicitly recover the source text from the vector.
LLMs can’t take embeddings (unless I’m really confused). Even if it could take embeddings, the embeddings would have lost all word sequence and structure (wouldn’t make sense to the LLM).