← Back to context

Comment by perlgeek

6 months ago

FHE might allow arbitrary computation, but I use most services because they have some data I want to use: their search index, their knowledge, their database of chemicals, my bank account transactions, whatever.

So unless Google lets me encrypt their entire search index, they can still see my query at the time it interacts with the index, or else they cannot fulfill it.

The other point is incentives: outside of some very few, high-trust high-stakes applications, I don't see why companies would go through the trouble and FHE services.

From what I understand, only the sensitive data needs to be encrypted (e.g. your bank transactions). It is still possible to use public unencryped data in the computation, as the function you want to compute doesn't have to be encrypted.

  • In a world where Target can figure out a women is pregnant before she knows herself due to her shopping habits, the line that separates sensitive data is pretty ambiguous.

Exactly what I thought. In the end it really isn't in most of the big corps interest to not see your data/query. They need/want to see it so why would they degrade their ability to do so if they can just say no and you will have to rely on using their services without FHE. For banking applications cool, everyone else debatable if it will ever be accepted.

You're right about incentives, but wrong about the first part. Private lookups of a plaintext database are possible and have been for a while now (5+ years?). The problem is it often requires some nontrivial preprocessing of the plaintext database, or in the worst case a linear scan of the entire database.

  • > Private lookups of a plaintext database are possible and have been for a while now (5+ years?). The problem is it often requires some nontrivial preprocessing of the plaintext database, or in the worst case a linear scan of the entire database.

    So that basically means that if a company has data that my program might want to use, the entirety of that data needs to be loaded into my program. Not quite feasible for something like the Google search index, which (afaik) doesn't even fit onto a single machine.

    Also, while Google is fine with us doing searches, making the whole search index available to a homomorphic encrypted program is probably a quite different beast.

    • You can process the data such that only a structured lookup table is shared with the client. That data structure is massive.

      The use case isn't really “search Google without them knowing my query”, it’s search my own data without them knowing my data”. Which limits the practically applicable scope considerably.

    • > the entirety of that data needs to be loaded into my program

      What? No. I'm not saying the entire Google search index is feasible, but you can do a lot. Here are some concrete numbers from what is now considered an "old" paper (2022; it has been improved since then)

      https://eprint.iacr.org/2022/949

      To make queries to a 1 GB database [in a scheme called DoublePIR] the client must download a 16 MB "hint" about the database contents; thereafter, the client may make an unbounded number of queries, each requiring 345 KB of communication, and a throughput of 7.4 GB/s/core.

  • Which ultimately results in gigabytes of per-client-encrypted data needing to be downloaded, and regenerated and redownloaded every time the index is updated.