Comment by bsenftner

3 months ago

Very interesting work. What are your opinions of GraphRAG and the variations?

I'm currently evaluating systems that extend RAG as your PageIndex project does, with an eye on adaptability to new information.

A good portion of my work involves legal issues and case law, and with the US going through a lot of legal transformations with the new administration, I am seeking a system that can ingest new information that imposes new rules on the handling of information, and those new rules need to impose precedence over any similar such rules already in the knowledgebase.

This new information ingestion and logical resolution within the larger knowledgebase needs to be efficient too. The initial GraphRAG is expensive to begin with, and does not appear to have any optimized handling for ingesting of new, conflicting information. The GraphRAG variants that are getting a lot of attention now appear to be addressing the lack of efficiency in the original GraphRAG implementation. Where does PageIndex set within this group of similar offerings?

2 comments

bsenftner

mingtianzhang 3 months ago

Hi Bsenftner, thanks for your interest.

The motivation behind building PageIndex is to build a reasoning-based RAG. When we previously designed a RAG system for financial documents, we encountered two main challenges:

1. Traditional embedding-based RAG often returns redundant information because all the financial terms are semantically similar.

2. We want to incorporate expert experience into the RAG process—specifically, experts often have a preferred order of where to look first.

To address these, we developed PageIndex, which transforms long documents into a structured “table of contents.” This allows the LLM to selectively retrieve relevant nodes based on reasoning. With this approach, we can do few-shot learning by providing examples of expert preferences directly in the prompt, enabling the LLM to choose nodes more like a domain expert would.

-----

In your case, it sounds like you're looking for a system that can automatically build and continuously update a knowledge base as new data arrives. You might benefit from something like:

1. Using expert knowledge to define a template knowledge graph—e.g., specifying entity types, link types, or a rough graph structure.

2. Building an agent that updates the knowledge graph when new documents are received. The agent’s tasks could include:

a. Identifying new information relevant to existing nodes or links.

b. Determining whether this new information changes the current knowledge graph.

c. Updating the graph accordingly.

Since your use case involves logical reasoning (not just semantic similarity), PageIndex and reasoning-based RAG could play a helpful role here. In other words, while a traditional graph-based RAG might still be used at inference (question answering) time, PageIndex and reasoning-based RAG can assist during the knowledge graph update phase by identifying related information in the new documents that are related to the graph. Additionally, the tree structure produced by PageIndex can be used as an initialization for building your knowledge graph.

Hope this is helpful! Mingtian

bsenftner 3 months ago

Thank you for the continued interest and support. I've got PageRank working now, and am in my exploratory R&D period. The space is deep and dynamic, plus I've got non-R&D responsibilities too. You'll be hearing from me, as I start to integrate.