Comment by cs702

20 days ago

> if you use enough terms in the Taylor expansion to get the same result as standard attention to within machine precision, the resulting constant state size should give you an upper bound for the amount of data the LLM can effectively retrieve from its context.

I think you've nailed it: Machine precision puts an upper bound (of constant size) on how much information an LLM can retrieve from its context.