Comment by ritvikpandey21

8 months ago

for most (text-dense) documents without much layout differences, these small prompt eng tricks work pretty well! scaling this to complex layouts and 1000+ page docs, we found the models don’t stick to their instructions. perhaps there’s some work to be done with 1M+ context length models so they don’t lose layout memory.

2 comments

ritvikpandey21

pbhjpbhj 8 months ago

Do any models use some sort of context pruning to keep the [most] relevant parts of the context?

What single documents are you processing that are 1000+ pages?

mulmboy 8 months ago

Is processing one page at a time not feasible? I'm always chunking things as small as possible for LLMs