← Back to context

Comment by blintz

3 years ago

The server has a snapshot of Wikipedia, sitting in memory. The server is blind to which article is requested because it computes a homomorphic dot product between an encrypted one-hot vector (encrypted under a key that only the client knows) and the total set of articles in plaintext. The encrypted query does not reveal anything about the requested index (ie it is semantically secure).

The 'magic' of homomorphic encryption is that the server is able to take an encrypted one-hot vector of bits, and a vector of plaintext articles, compute a homomorphic dot product, and produce a single ciphertext encoding the single desired plaintext article. This single output ciphertext is crucially also encrypted under the client's secret key, so it reveals nothing to the server.

Thanks. I knew homomorphic encryption let's you do math on encrypted data without knowing its contents. But it hadn't occurred to me you can keep one side of the arithmetic in plaintext. It also helped when gojomo pointed out that every query touches every byte of the full dataset.

So basically (if I've got this right and at the risk of over-simplifying): Client computes a bit mask and encrypts it, sends that to the server, and says "do a big multiplication of this against Wikipedia content". That leaves only the relevant portion in the encrypted result, which only the client can decrypt.

How does the client know which bits in the initial "mask" (aka one-hot vector) to set? Does it have to download a full index of article titles? (So in a sense this is just a big retrieval system that pulls an array element based on its index, not a freeform search tool).

  • Yes, all of the mapping of article titles to index happens locally.

    If the index got too large, we could use a hash to bucket things, but this incurs some waste and would only support exact article title matches.

    Yes, it is definitely not a freeform search tool.

Don't you have to pad the output ciphertext size to match the largest article you could possibly request from the set of articles? Or is a fixed-size output an inherent property of homomorphic encryption schemes? Otherwise it seems to reveal something to the server just by the size of the ciphertext (since WP articles vary in size).

  • The output is always going to be fixed size.

    The nicer way to look at homomorphic encryption is like math, but the more literal way is like a bunch of logic gates. There are N output wires. You control the bits on the input wires. Nothing you do will change the number of wires.

  • Yes, every item needs to be the same size. We batch the articles into 100KB chunks, and we split articles larger than this into multiple chunks.

    • Wouldn't you then have to send out multiple ciphertexts (for articles >100 KB)? Which would leak something about the size of the article...

      1 reply →