Comment by energy123

10 months ago

This problem will be eaten by OpenAI et al. the same way the careful prompting strategies used in 2022/2023 were eaten. In a few years we will have context lengths of 10M+ or online fine tuning, combined with agents that can proactively call APIs and navigate your desktop environment.

Providing all context will be little more than copying and pasting everything, or just letting the agent do its thing.

Super careful or complicated setups to filter and manage context probably won't be needed.

6 comments

energy123

OutOfHere 10 months ago

Context requires quadratic VRAM. It is why OpenAI hasn't even supported 200k context length yet for its 4o model.

Is there a trick that bypasses this scaling constraint while strictly preserving the attention quality? I suspect that most such tricks lead to performance loss while deep in the context.

energy123 10 months ago

I wouldn't bet against this. Whether it's Ring attention, Mamba layers or online fine tuning, I assume this technical challenge will get conquered sooner rather than later. Gemini are getting good results on needle in a haystack with 1M context length.
I suspect the sustainable value will be in providing context that isn't easily accessible as a copy and paste from your hard drive. Whatever that looks like.
whimsicalism 10 months ago

Even subpar attention quality is typically better than human memory - we can imagine models that do some sort of triaging from shorter high-quality attention context and extremely long linear (or something else) context.
dist-epoch 10 months ago

> Context requires quadratic VRAM
Even if this is not solved, there is so much economic benefit, tens of TBs of VRAM will become feasible.

CharlieDigital 10 months ago

Even if your context is a trillion tokens in length, the problem of creating that context still exists. It's still ETL and systems integration.

whimsicalism 10 months ago

The model can take actions on the computer - give it access to the company wiki and slack and it can create its own context.
Yall really are just assuming this technology will stay still and not extrapolating from trends. A model that can get 25% on frontiermath is probably soon going to be able to navigate your company slack, that is not a more difficult problem than expert-level math proof development.