Comment by yawnxyz

6 months ago

I'm curious if chunking is different for embeddings vs. for "agentic retrieval" e.g. an AI or a person operates like a Librarian; they look up in an index at what resources to look up, get the relevant bits, then piece them together into a cohesive narrative whole — would we do any chunking at all for this, or does this purely rely on the way the DB is setup? I think for certain use cases, even a single DB record could be too large for context windows, so maybe chunking might need to be done to the record? (e.g. a db of research papers)

1 comment

yawnxyz

snyy 6 months ago

Great questions!

Chunking fundamentals remain the same whether you're doing traditional semantic search or agentic retrieval. The key difference lies in the retrieval strategy, not the chunking approach itself.

For quality agentic retrieval, you still need to create a knowledge base by chunking documents, generating embeddings, and storing them in a vector database. You can add organizational structure here—like creating separate collections for different document categories (Physics papers, Biology papers, etc.)—though the importance of this organization depends on the size and diversity of your source data.

The agent then operates exactly as you described: it queries the vector database, retrieves relevant chunks, and synthesizes them into a coherent response. The chunking strategy should still optimize for semantic coherence and appropriate context window usage.

Regarding your concern about large DB records: you're absolutely right. Even individual research papers often exceed context windows, so you'd still need to chunk them into smaller, semantically meaningful pieces (perhaps by section, abstract, methodology, etc.). The agent can then retrieve and combine multiple chunks from the same paper or across papers as needed.

The main advantage of agentic retrieval is that the agent can make multiple queries, refine its search strategy, and iteratively build context—but it still relies on well-chunked, embedded content in the underlying vector database.