Comment by mycall

4 hours ago

While it seems like a good idea, resending a growing context window is very inefficient and costly. Instance pinning would make a huge efficiency gains but also collapse LLM provider revenue. This is something open models could better solve.

even a max size context window is what, ~1M? iirc tokens are generally part of a vocab of size ~300k. Assume no compression before the encryption (no clue if this is true, but compressing text before encryption can leak info regarding the message, namely how compressible it is), that's \log2 300k ~ 18 bits per token, or ~2 bytes. So each "turn" would involve ~2MB extra in each direction. And again, this is assuming max context.

seems plausibly fine

Can you elaborate? How could it be more efficient and bad for revenue? Would it also be bad for profit?

  • More efficient in terms of bandwidth (not) used. More costly because it has to be stored somewhere instead.