Comment by kovek
3 hours ago
10s of GBs? ( 1,000,000 context * 1,000 vector size ) ^ 2 = 1,000,000,000,000,000,000… oh wow.. I must be miscalculating
What about only storing the conversation and then recomputing the embeddings in the cache? Does that cost a lot? Doing a lot of matrix multiplication does not cost dollars of compute, especially on specialized hardware, right?
Context length 1e6, vector length 1e3, and 1e2 model layers for 100e9 context size. Costs will go up even more with a richer latent space and more model layers, and the western frontier outfits are reasonably likely to be maximizing both.