← Back to context

Comment by ElFitz

6 months ago

> Now it looks to me that the whole input must be encrypted with key k. But in the search example, the inputs include a query […] and a multi-terabyte database […]

That’s not the understanding I got from Apple’s CallerID example[0][1]. They don’t seem to be making an encrypted copy of their entire database for each user.

[0]: https://machinelearning.apple.com/research/homomorphic-encry...

[1]: https://machinelearning.apple.com/research/wally-search

They do not explicitly state this fact, but they link to the homomorphic encryption scheme they're using, which works like this. To perform an operation between a plaintext value and an encrypted value, you first encrypt the plaintext with the public key and then you can do your operation on the encrypted values to get the encrypted output.

Moreover, even if the details were slightly different, a scheme that reveals absolutely no information about the query while interacting with a database always needs to do a full scan. If some parts remain unread depending on the query, this tells you what the query wasn't. If you're okay with revealing some information, you can also hash the query and take a short prefix of the hash with many colliders, then only scan values with the same hash prefix. This is how browsers typically do safe browsing lookups, but by downloading that subset of the database instead of doing the comparison homomorphically on the server.

  • Is that what they mean in the Wally paper post by

    > In previous private search systems, for each client query, the server must perform at least one expensive cryptographic operation per database entry.

    ?

    • Exactly. (I had only looked at the homomorphic encryption post, not the Wally post.) Wally tries to work around this limitation by only using homomorphic encryption for a subset of the database, and reducing the resulting information leakage by using an anonymous network to hide which client is querying which subset. They say this network is operated by a third party, but ultimately you still have to trust that the network operator isn't colluding with the server operator to deanonymize your queries. That's a weaker privacy guarantee, but at least it's not painfully slow.