Comment by mingtianzhang

13 days ago

Hi Bsenftner, thanks for your interest.

The motivation behind building PageIndex is to build a reasoning-based RAG. When we previously designed a RAG system for financial documents, we encountered two main challenges:

1. Traditional embedding-based RAG often returns redundant information because all the financial terms are semantically similar.

2. We want to incorporate expert experience into the RAG process—specifically, experts often have a preferred order of where to look first.

To address these, we developed PageIndex, which transforms long documents into a structured “table of contents.” This allows the LLM to selectively retrieve relevant nodes based on reasoning. With this approach, we can do few-shot learning by providing examples of expert preferences directly in the prompt, enabling the LLM to choose nodes more like a domain expert would.

-----

In your case, it sounds like you're looking for a system that can automatically build and continuously update a knowledge base as new data arrives. You might benefit from something like:

1. Using expert knowledge to define a template knowledge graph—e.g., specifying entity types, link types, or a rough graph structure.

2. Building an agent that updates the knowledge graph when new documents are received. The agent’s tasks could include:

a. Identifying new information relevant to existing nodes or links.

b. Determining whether this new information changes the current knowledge graph.

c. Updating the graph accordingly.

Since your use case involves logical reasoning (not just semantic similarity), PageIndex and reasoning-based RAG could play a helpful role here. In other words, while a traditional graph-based RAG might still be used at inference (question answering) time, PageIndex and reasoning-based RAG can assist during the knowledge graph update phase by identifying related information in the new documents that are related to the graph. Additionally, the tree structure produced by PageIndex can be used as an initialization for building your knowledge graph.

Hope this is helpful! Mingtian

Thank you for the continued interest and support. I've got PageRank working now, and am in my exploratory R&D period. The space is deep and dynamic, plus I've got non-R&D responsibilities too. You'll be hearing from me, as I start to integrate.