Comment by rkagerer
3 years ago
Thanks. I knew homomorphic encryption let's you do math on encrypted data without knowing its contents. But it hadn't occurred to me you can keep one side of the arithmetic in plaintext. It also helped when gojomo pointed out that every query touches every byte of the full dataset.
So basically (if I've got this right and at the risk of over-simplifying): Client computes a bit mask and encrypts it, sends that to the server, and says "do a big multiplication of this against Wikipedia content". That leaves only the relevant portion in the encrypted result, which only the client can decrypt.
How does the client know which bits in the initial "mask" (aka one-hot vector) to set? Does it have to download a full index of article titles? (So in a sense this is just a big retrieval system that pulls an array element based on its index, not a freeform search tool).
Yes, all of the mapping of article titles to index happens locally.
If the index got too large, we could use a hash to bucket things, but this incurs some waste and would only support exact article title matches.
Yes, it is definitely not a freeform search tool.