Comment by btown
8 hours ago
I do think that what we think of as RAG will change!
When any given document can fit into context, and when we can generate highly mission-specific summarization and retrieval engines (for which large amounts of production data can be held in context as they are being implemented)... is the way we index and retrieve still going to be based on naive chunking, and off-the-shelf embedding models?
For instance, a system that reads every article and continuously updates a list of potential keywords with each document and the code assumptions that led to those documents being generated, then re-runs and tags each article with those keywords and weights, and does the same to explode a query into relevant keywords with weights... this is still RAG, but arguably a version where dimensionality is closer tied to your data.
(Such a system, for instance, might directly intuit the difference in vector space between "pet-friendly" and "pets considered," or between legal procedures that are treated differently in different jurisdictions. Naive RAG can throw dimensions at this, and your large-context post-processing may just be able to read all the candidates for relevance... but is this optimal?)
I'm very curious whether benchmarks have been done on this kind of approach.
No comments yet
Contribute on Hacker News ↗