Comment by FuckButtons
6 hours ago
So use a block based cache and tune the block size to maximize the hit rate? This isn’t rocket science.
6 hours ago
So use a block based cache and tune the block size to maximize the hit rate? This isn’t rocket science.
This seems misguided, you have to cache a prefix due to attention.