Comment by vanviegen

11 hours ago

They probably already do that. But these caches can get pretty big (10s of GBs per session), so that adds up fast, even for cold storage.

2 comments

vanviegen

kovek 1 hour ago

10s of GBs? ( 1,000,000 context * 1,000 vector size ) ^ 2 = 1,000,000,000,000,000,000… oh wow.. I must be miscalculating

What about only storing the conversation and then recomputing the embeddings in the cache? Does that cost a lot? Doing a lot of matrix multiplication does not cost dollars of compute, especially on specialized hardware, right?

Majromax 34 minutes ago

Context length 1e6, vector length 1e3, and 1e2 model layers for 100e9 context size. Costs will go up even more with a richer latent space and more model layers, and the western frontier outfits are reasonably likely to be maximizing both.