Comment by yawnxyz
2 years ago
Does this mean you can store somewhat lossy text/data only as embeddings? Or is that generally a not so good idea...
2 years ago
Does this mean you can store somewhat lossy text/data only as embeddings? Or is that generally a not so good idea...
The claim of the paper is that you can store it losslessly! If you assume you have access to an LLM for free, then text is extremely compressible. Storing it in an embedding would be plenty of bits.
... which reminds me of recent research on using lossless compression (plus kNN) for text classification.
https://news.ycombinator.com/item?id=36707193
Yes, you can do this.
This is how you would implement a "medium term" memory. Folks in the "sentence-transformer" world have known this forever, yet the wider NLP world ignores it in the context of "chatbots" despite how powerful that and related concepts like "soft-prompts/textual inversion" are.
It's a wonderful technique and the fact that it's not used in ChatGPT and other tools like it is a shocking shame.