← Back to context

Comment by teo_zero

6 months ago

I think the opening example involving Google is misleading. When I hear "Google" I think "search the web".

The articles is about getting an input encrypted with key k, processing it without decrypting it, and sending back an output that is encrypted with key k, too. Now it looks to me that the whole input must be encrypted with key k. But in the search example, the inputs include a query (which could be encrypted with key k) and a multi-terabyte database of pre-digested information that's Google's whole selling point, and there's no way this database could be encrypted with key k.

In other words this technique can be used when you have the complete control of all the inputs, and are renting the compute power from a remote host.

Not saying it's not interesting, but the reference to Google can be misunderstood.

> Now it looks to me that the whole input must be encrypted with key k. But in the search example, the inputs include a query […] and a multi-terabyte database […]

That’s not the understanding I got from Apple’s CallerID example[0][1]. They don’t seem to be making an encrypted copy of their entire database for each user.

[0]: https://machinelearning.apple.com/research/homomorphic-encry...

[1]: https://machinelearning.apple.com/research/wally-search

  • They do not explicitly state this fact, but they link to the homomorphic encryption scheme they're using, which works like this. To perform an operation between a plaintext value and an encrypted value, you first encrypt the plaintext with the public key and then you can do your operation on the encrypted values to get the encrypted output.

    Moreover, even if the details were slightly different, a scheme that reveals absolutely no information about the query while interacting with a database always needs to do a full scan. If some parts remain unread depending on the query, this tells you what the query wasn't. If you're okay with revealing some information, you can also hash the query and take a short prefix of the hash with many colliders, then only scan values with the same hash prefix. This is how browsers typically do safe browsing lookups, but by downloading that subset of the database instead of doing the comparison homomorphically on the server.

    • Is that what they mean in the Wally paper post by

      > In previous private search systems, for each client query, the server must perform at least one expensive cryptographic operation per database entry.

      ?

      1 reply →

Homomorphically encrypted services don't need a priori knowledge of the encryption key. That's literally the whole point.

Consider the following (very weak) encryption scheme:

m, k ∈ Z[p], E(m) = m * k mod p, D(c) = c * k⁻¹ mod p

With this, I can implement a service that receives two cyphertexts and computes their encrypted sum, without knowledge of the key k:

E(x) + E(y) = x * k + y * k mod p = (x + y) * k mod p = E(x + y)

Of course, such a service is not too interesting, but if you could devise an algebraic structure that supported sufficiently complex operations on cyphertexts (and with a stronger encryption), then by composing these operations one could implement arbitrarily complex computations.

  • In the case of searching Google, E(x) is the encrypted query and y is Google's database. Can you compute E(x + y) without doing at least as much work as computing E(y)? I don't think so. Instead, you use public key cryptography so that the server can compute E(y) (yes, encrypting the entire database) without being able to decrypt D(E(x)) = x.

    • Wrong. In the case of searching, the database is the function that you feed the input query into.

      E.g. consider the following system:

      E(x) = x ^ k, D(x) = x ^ k

      So a one-time pad. Let's say that I provide a service that lets you decide whether a number is even or odd:

      IsOdd(E(x)) = E(x) mod 2

      You give it an encrypted number, and it gives you back an encrypted bit that you can decrypt to see if the original number was even or not. All while it has zero knowledge of your plaintext number.

      A homomorphically encrypted database is just like IsOdd(x), except orders of magnitude more complex. One idea is that any computation can be turned into a Boolean circuit, so if you have homomorphic building blocks for Boolean circuits, you can implement any computation. Obviously, some caveats apply, like all loops have to be unrolled, etc. That's why the whole thing is so inefficient. But mathematically it works.

      6 replies →

I don't know, when I hear Google I hear Gmail, Google docs, and every other service they have to know about people. My mom would probably think mostly about search, but then she would not read an article about HME