Comment by Imanari
3 months ago
Interesting work! How do you construct the relationship between nodes if not all documents fit into context?
3 months ago
Interesting work! How do you construct the relationship between nodes if not all documents fit into context?
Hi Imanari! That’s essentially one of the key challenges we’re aiming to address with our PageIndex package.
We’ve designed two LLM functions:
a. LLM Function 1: init_content -> initial_structure
b. LLM Function 2: (previous_structure, current_content) -> current_structure
The idea is to split a long document into several page groups (each within the context window size). You first apply Function 1 to the first group to get the initial structure, then use Function 2 in a for-loop over the remaining page groups to recursively build out the rest of the structure.
This approach is commonly used in representation learning for time-series data. We'll be releasing a technical report on it soon as well.
Mingtian
Thanks! I have thought about similar approaches of iteratively building the content-graph of your document base, as you described. I worry about scaling, though. IIUC both previous_structure and current content must fit into context while previous_structure is getting bigger with each iteration, correct?
EDIT: follow up question, how long does the structure-building take for 100 pages and how big are the chunks you are feeding in?