← Back to context

Comment by ritvikpandey21

15 days ago

for most (text-dense) documents without much layout differences, these small prompt eng tricks work pretty well! scaling this to complex layouts and 1000+ page docs, we found the models don’t stick to their instructions. perhaps there’s some work to be done with 1M+ context length models so they don’t lose layout memory.

Do any models use some sort of context pruning to keep the [most] relevant parts of the context?

What single documents are you processing that are 1000+ pages?

Is processing one page at a time not feasible? I'm always chunking things as small as possible for LLMs