Comment by Imanari

9 days ago

Interesting work! How do you construct the relationship between nodes if not all documents fit into context?

Hi Imanari! That’s essentially one of the key challenges we’re aiming to address with our PageIndex package.

We’ve designed two LLM functions:

a. LLM Function 1: init_content -> initial_structure

b. LLM Function 2: (previous_structure, current_content) -> current_structure

The idea is to split a long document into several page groups (each within the context window size). You first apply Function 1 to the first group to get the initial structure, then use Function 2 in a for-loop over the remaining page groups to recursively build out the rest of the structure.

This approach is commonly used in representation learning for time-series data. We'll be releasing a technical report on it soon as well.

Mingtian