Comment by letitgo12345
2 days ago
LLMs can use search engines as a tool. One possibility is Google embeds the search query through these embeddings and does retrieval using them and then the retrieved result is pasted into the model's chain of thought (which..unless they have an external memory module in their model, is basically the model's only working memory).
I'm reading the docs and it does not appear Google keeps these embeddings at all. I send some text to them, they return the embedding for that text at the size I specified.
So the flow is something like:
1. Have a text doc (or library of docs)
2. Chunk it into small pieces
3. Send each chunk to <provider> and get an embedding vector of some size back
4. Use the embedding to:
4a. Semantic search / RAG: put the embeddings in a vector DB and do some similarity search on the embedding. The ultimate output is the source chunk
4b. Run a cluster algorithm on the embedding to generate some kind of graph representation of my data
4c. Run a classifier algorithm on the embedding to allow me to classify new data
5. The output of all steps in 4 is crucially text
6. Send that text to an LLM
At no point is the embedding directly in the models memory.