Comment by lmeyerov
7 months ago
So subtle! The article is on doing that, which is something we are doing a lot on right now... though it seems to snatch defeat from the jaws of victory:
If we think about what this is about, it is basically entity augmentation & lexical linking / citations.
Ex: A patient document may be all about patient id 123. That won't be spelled out in every paragraph, but by carrying along the patient ID (semantic entity) and the document (citation), the combined model gets access to them. A naive one-shot retrieval over a naive chunked vector index would want it at the text/embedding, while a smarter one also in the entry metadata. And as others write, this helps move reasoning from the symbolic domain to the semantic domain, so less of a hack.
We are working on some fun 'pure-vector' graph RAG work here to tackle production problems around scale, quality, & always-on scenarios like alerting - happy to chat!
Also working with GRAG (via Neo4j) and I'm somewhat skeptical that for most cases where a natural hierarchical structure already exists that graph will significantly exceed RAG with the hierarchical structure.
A better solution I had thought about its "local RAG". I came across this while processing embeddings from chunks parsed from Azure Document Intelligence JSON. The realization is that relevant topics are often localized within a document. Even across a corpus of documents, relevant passages are localized.
Because the chunks are processed sequentially, one needs only to keep track o the sequence number of the chunk. Assume that the embedding matches with a chunk n, then it would follow that the most important context are the chunks localized at n - m and n + p. So find the top x chunks via hybrid embedding + full text match and expand outwards from each of the chunks to grab the chunks around it.
While a chunk may represent just a few sentences of a larger block of text, this strategy will grab possibly the whole section or page of text localized around the chunk with the highest match.
This works until relevant information is colocated. Sometimes though, for example in financial documents, important parts reference each other through keywords etc. That's why you can always try and retrieve not only positionally related chunks but also semantically related ones.
Go for chunk n, n - m, n + p and n' where n' are closest chunks to n semantically.
Moreover you can give this traversal possibility to your LLM to use itself as a tool or w/e whenever it is missing crucial information to answer the question. Thanks to that you don't always retrieve thousands of tokens even when not needed.
That's why the entry point would still be an embedding search; it's just that instead of using the first 20 embedding hits, you take the first 5 and if the reference is "semantically adjacent" to the entry concept, we would expect that some of the first few chunks would capture it in most cases.
I think where GRAG yields more relevancy is when the referenced content is not semantically similar nor even semantically adjacent to the entry concept but is semantically similar to some sub fragment of a matched chunk. Depending on the corpus, this can either be common (no familiarity with financial documents) or rare. I've primarily worked with clinical trial protocols and at least in that space, the concepts are what I would consider "snowflake-shaped" in that it branches out pretty cleanly and rarely cross-references (because it is more common that it repeats the relevant reference).
All that said, I think that as a matter of practicality, most teams will probably get much bigger yield with much less effort doing local expansion based on matching for semantic similarity first since it addresses two core problems with embeddings (text chunk size vs embedding accuracy, relevancy or embeddings matched below a given threshold). Experiment with GRAG depending on the type of questions you're trying to answer and the nature of the underlying content. Don't get me wrong; I'm not saying GRAG has no benefit, but that most teams can explore other ways of using RAG before trying GRAG.
3 replies →
[flagged]