Comment by FuckButtons
5 hours ago
So use a block based cache and tune the block size to maximize the hit rate? This isn’t rocket science.
5 hours ago
So use a block based cache and tune the block size to maximize the hit rate? This isn’t rocket science.
This seems misguided, you have to cache a prefix due to attention.