Comment by esperent
11 hours ago
Is it because of caching? If the context changes arbitrarily every turn then you would have to throw away the cache.
11 hours ago
Is it because of caching? If the context changes arbitrarily every turn then you would have to throw away the cache.
So use a block based cache and tune the block size to maximize the hit rate? This isn’t rocket science.
This seems misguided, you have to cache a prefix due to attention.