Comment by lmeyerov
1 year ago
I've been curious about this use case, so cool to see, and more so, to know it worked!
This is essentially a realization of how graph RAG flavor systems work under the hood. Basically you create hierarchical summary indexes, such as topical cross-document ones, and tune the summaries to your domain. At retrieval time, one question will be able to leverage richer multi-hop concepts that span ideas that are individually distinct & lexically, but get used together. Smarter retrievers can choose to dynamically expand on this (agentic: 'follow links') or work more in bulk on the digests ('map/reduce over summaries') without having to run every chunk through the LLM.
Once you understand what is going on in core graph rag, you can even add non-vector relationships to the indexing and retrieval steps, such as from a static code analysis, which afaict is the idea here. For a given domain, likewise, you can do custom templates to tune what is in each summary, like different wiki page styles for different topics. (Note: despite the name & vendor advertising, no graph DB nor knowledge graph is needed for graph RAG, which makes its relationship to autowiki etc concepts less obvious.)
We are building out some tech here to deal with core production issues like to update/add items without reindexing everything and making larger ingests faster+cheaper. Eg, imagine monitoring a heavy feed or quickly changing repo. If of interest to any, please ping - we are putting together design partner cohorts for the RAG phase of louie.ai .
Nice!