Comment by 3s

1 year ago

This is an interesting idea, and indeed such an approach to providing privacy has been formalized to different degrees and varying levels of success (eg. [1][2]).

[1] https://arxiv.org/abs/1204.2136 [2] https://arxiv.org/abs/2210.03458

Unfortunately, as described, such a solution would only satisfy a somewhat meaningless notion of privacy. Specifically, the embeddings by definition contain potentially private information about the user, revealing things like "I'm asking about birds" to use your example. Even though it might "compress" the query in a slightly lossy way, it would still reveal a great deal of information about the query.

A true solution to this problem would require something like differential privacy and adding noise to the embeddings. However, the noise required would (likely) end up destroying too much information from the embedding to preserve accuracy of the LLM.