Comment by letitgo12345

2 days ago

LLMs can use search engines as a tool. One possibility is Google embeds the search query through these embeddings and does retrieval using them and then the retrieved result is pasted into the model's chain of thought (which..unless they have an external memory module in their model, is basically the model's only working memory).

1 comment

letitgo12345

stillpointlab 2 days ago

I'm reading the docs and it does not appear Google keeps these embeddings at all. I send some text to them, they return the embedding for that text at the size I specified.

So the flow is something like:

1. Have a text doc (or library of docs)

2. Chunk it into small pieces

3. Send each chunk to <provider> and get an embedding vector of some size back

4. Use the embedding to:

4a. Semantic search / RAG: put the embeddings in a vector DB and do some similarity search on the embedding. The ultimate output is the source chunk

4b. Run a cluster algorithm on the embedding to generate some kind of graph representation of my data

4c. Run a classifier algorithm on the embedding to allow me to classify new data

5. The output of all steps in 4 is crucially text

6. Send that text to an LLM

At no point is the embedding directly in the models memory.