A High-Level Technical Overview of Homomorphic Encryption

2 years ago (jeremykun.com)

Amazing summary, Jeremy!

One nitpick is regarding the double-CRT: you are referring to the RNS encoding, when the original paper[0] uses the term to talk about how polynomials are stored for fast computation. It's a nice philosophical view of decomposing the polynomial Φm(X) into products X − ζi the same way that the integer modulus Q is decomposed into primes. So it's more like one CRT on the coefficients, and another implemented as a DFT.

[0] https://eprint.iacr.org/2012/099

  • This may be some nuance I'm not quite getting still. You're saying that even if you use the coefficient encoding, one would still have reason to store polynomials in the CRT domain. I agree that would be more performant, but it seems not useful because then your element-wise products only ever correspond to convolutions (as I mention in the article). As far as I can tell, everyone does Double-CRT with RNS encoding, but I may be mistaken.

for those wondering about the and/xor thing, well, of course a nand b is just 1 xor (a and b), but you can do enormously better than that!

it turns out there's a very pretty formulation of boolean combinational logic (i can't bring myself to say 'circuits') called zhegalkin polynomials, in which arbitrary boolean functions are just polynomials in gf(2). i wrote a simple implementation of them in http://canonical.org/~kragen/sw/dev3/zhegalkin.py.

Again my math ignorance screws me, I wish I had more acumen here to grasp it all but this did catch my eye: “ In one example FHE scheme with lightweight keys, a ciphertext encrypting a single integer is on the order of 25 KB, and the special keys are about 0.5 GB.”

I’ve been trying to puzzle out the market for this kind of technology.

Making truly private data actually useful while retaining the privacy would of course be incredible and the use cases are obvious. But the people that have big data generally are not interested in privacy at all.

Academic interest in fhe is understandable since it’s fascinating, but why does industry care? Is this just hedging their bets for if/when the future brings stronger data regulation?

  • My undergraduate thesis project attempted to use homomorphic encryption to create a voting system where any party could verify the final tally count while keeping ballot secrecy.

    • Just like the other industries - are politicians generally interested in changing the status quo that works for them nicely already? Based on the history of use of auditable voting systems use in real-world elections, it feels like barely anyone cares.

      1 reply →

  • there are a lot of hard security problems that become easy if you can introduce an incorruptible 'trusted third party'. often the computations involved are pretty trivial and involve tiny amounts of data

    want to find out which of your friends have secret crushes on you? you all tell trent, and then he tells you which of your crushes also have crushes on you, and also tells them

    want digital money without double-spending, privacy invasion, or money supply inflation? just let trent keep track of what everyone's balance is. to pay someone, tell trent how much you want to pay them, and he'll decrease your balance and increase theirs. trent promises not to increase anyone's balance without decreasing somebody else's by the same amount

    want to buy a good at the welfare-maximizing price? use a second-price auction: everyone privately tells trent their bid, and trent announces the winner (the person who bid the highest) and the price (the highest losing bid). that way bidding lower never gets someone the same good at a lower price; it just decreases their chance of winning

    want to play poker with some people you don't trust? trent mentally shuffles the deck and tell you what cards he's dealt you (and what other cards are visible, in some variants). at the end of the round, he announces what cards were in every hand that didn't fold, and who won

    there are an infinite number of problems like this

    the trouble with trent is that any human 'trent' is corruptible, and trusting them with secrets gives them more power, and absolute power corrupts absolutely. the humans faff around with checks and balances and institutions and oaths and whatnot but they're pretty fragile. cryptographers have often designed clever protocols which solve individual problems from this infinite set, and some of them are secure

    fhe makes it possible to construct an incorruptible 'trent': everybody can see the program, everybody can verify the operation of the program step by step, but nobody can see the data. it's almost a fully general cryptographic protocol design primitive

    i forget who it was that explained this to me. i thought it was nick szabo but i can't find the essay now

  • I think the AI-model-as-a-service is actually a great use case.

    You want to use their AI model but you don't trust them to not train on your data so you don't want to send your data to them. They don't trust you enough to send you their models.

    So you encrypt your data, they compute on your data and send you the encrypted result back.

    The only problem is that you have turned an expensive computation into a exponentially more expensive computation

    • > I think the AI-model-as-a-service is actually a great use case.

      It's a good and natural use-case, but a use-case won't necessarily make a market.

      Similar to the sibling comment that mentions voting. It's hard to get excited about new maths allegedly fixing problems in places where we have existing fixes that are ignored. If the simpler fixes aren't acceptable to the incumbent, naturally they'll just rule out bigger and better fixes for the same reasons and won't even bother to explain themselves to the public. (Do we need cryptographically modern voting when we can't even agree to fix stuff like gerrymandering?)

      As an example, just looking at security/compliance as an industry, you'd think that people care about things like "verifiably correct" and yet there's so much of it that is just theater (self-attestation and other pinky-promises). Similarly for most B2B contracts that involve data-sharing and "do not JOIN with .." clauses. That stuff just happens so that outfits like Facebook can disavow any bad behaviour coming from 3rd parties, but it's behaviour they don't actually want to stop because that's the whole business model. Corporations like the theater that we have. And even if it's expensive (contract lawyers, compliance experts), they like that too because it's part of their moat, as long as it doesn't truly impact anything about operations.

      If FHE were going to fix things later when it's matured, I would expect people today to care more about things like certified, legally actionable traces for data-lineage. (Having at least primitive lineage in place is already a cost of doing business, because otherwise you can't reliably work with tons of diverse inputs on tons of diverse models for training. And yet officially facebook [doesnt know what happens with your data](https://www.vice.com/en/article/akvmke/facebook-doesnt-know-...), and the world seems to have basically accepted that answer.)

      > You want to use their AI model but you don't trust them to not train on your data so you don't want to send your data to them. They don't trust you enough to send you their models.

      Basically saying this facilitates trust between competitors? It's an interesting idea, but I'm skeptical. Seems like Walmart will keep using Microsoft or Google's cloud just because they hate Amazon and don't want to arm the enemy with cash, not because they don't trust the enemy with information. Similarly for say American vs Chinese state interests.. fixing trust completely won't make it ok to outsource compute, because regardless of the information they don't even want the money moving that way.

      Setting aside direct competitors, maybe it's an credit/insurance company with private records, and a vendor like Amazon with trained models? In this case they aren't direct competitors but just client/vendor. No one in this arrangement really cares about the privacy of consumers, so a pinky-promise is fine. Any fuck ups that end in leaks, and both parties have PR ass-coverage because they just blame the other guy. If anyone pays fines, no one cares, because it's less than the cost of doing this work any other way.

      Thinking more about this, maybe I can imagine a real market for FHE with healthcare, because even the giants of surveillance capitalism can agree about both parts of this: they selfishly want their own privacy here, and they also stand to directly benefit from making research on aggregates more easily possible at scale.

      Besides healthcare I'm cynical, probably cloud companies want FHE everywhere so they can sell more compute, and maybe it'll be even more compute-hungry than blockchain/AI. As much as I like the idea of seeing Amazon/Facebook lobbyists fist-fighting each other for the amusement of congress, maybe we should try simple solutions like basic laws, and enforcement of those laws before we try redistributing cash from ad-tech to hardware-mongers

  • FHE is somewhat of a spiritual predecessor to e2e encryption. Imagine a VT100 installed with FHE hardware accelerator hooked up to AT&T phone line that can look up phone books and chat inbox in seconds. It also has constant execution time by nature, so it suits better to realtime systems like good old landline phone networks.

    Also looks like it's mathematically related to Diffie–Hellman key exchange? So it might yield something in long term in that direction.

  • People are (very) slowly realizing that plastering their data all over third party clouds is dangerous for their own privacy and safety. Hopefully we can make data breaches existentially costly for companies so they stop fucking around and take privacy seriously. People want FHE, they just don't know it. It solves so many problems and when it works really well, it will become a selling point for customers. Companies that don't take data privacy seriously will rot.

I wonder is there is some restricted kind of homomorphic encryption so that the encryption is tailored to be used for certain kind of programs. For example if the program is to compute the average of a list of numbers then some simple encryption could be done just to compute the average and nothing more. Is there some related concept to this idea of the encryption restricted to a concrete computation?

  • One related concept is “partially homomorphic encryption” (https://en.wikipedia.org/wiki/Homomorphic_encryption#Partial...). These are encryption schemes that are homomorphic under one operation such as addition or multiplication.

    For example, you could use an additively homomorphic scheme to compute a sum of encrypted values. This could then be converted into an average assuming you knew the number of values.

  • This is basically what people do with problems like Private Information Retrieval and Private Set Intersection. They use some lightly applied FHE as a subroutine, but otherwise hand-develop a cryptographic protocol for the specific program at hand.

    As it turns out, these applications are much more widely used than FHE.

  • I think there are special cases, like Yao's millionaire problem, where you compute a simple predicate to compare two numbers. I do not know whether such a notion will save you much, though. Because as soon as you can compute a simple instruction like SUBLEQ you have a Turing complete scheme.

> For example, it is not be possible to write an FHE program that uses fewer instructions than the program’s worst-case input. Doing so would reveal information that the input is not the worst-case, which contradicts the security of the cryptography. Instead, an FHE program must eagerly compute all branches of if-statements, and then use a select statement to decide which result should be used.

What exactly is "select statement"? Curious to know how to select a value without giving away what this value equals to (even in encrypting form). Otherwise I assume it would give away as much as seeing which branch was taken.

I've been following Jeremy's blog for some years now and although I don't always understand everything, I appreciate it for being a relatively accessible look into the research that's going on.

> you can implement ciphertext-ciphertext multiplication: x.y = ((x^2 + y^2) - (x^2 - y^2))/4

However this one part confused me - the RHS seems to simplify to y^2 / 2. Is there a mistake here or is this specific to the polynomial fields being worked in? (Or am I being dumb?)

> Much of the FHE industry is focused right now on the subset of applications where legal hurdles make the two available options “slow, but compliant” and “fast, but incurs prison time.”

Anyone have any examples of these applications?

How does multiplication work in LWE? Let's say in the non-trippy variant, the "leveled homomorphic encryption".

what are potential benefits and use cases of writing a program using FHE if the program is literally a million times slower than a similar program without FHE?

I believe that homomorphic encryption is one of those technologies that is too dangerous to develop. It is one step to a path where any person on earth will eventually have access to enormous amounts of computation. Is that a good thing? Well, it means one can do high-powered AI research on chemical and biological weapons or advanced malware by using high-powered compute clusters that they may not otherwise have access to. Moreoever, it will be impossible to detect since the computations themselves are encrypted.

  • How does FHE lead to people getting access to more computation?

    • (as far as I understand) it makes trustless compute farms possible im theory

    • Well, more people can buy time on an FHE computer and compute whatever they want on it, like a program that can find even more dangerous viruses, and no one will be able to detect what they are doing because their computations are encrypted. In other words, the data that could give them away is transformed under a homomorphism that disguises that data with encryption so that the nature of the computations is not evident.

      5 replies →

  • I think there's a case to be made that FHE will empower "laundering" responsibility for using cloud resources for evil purposes. But I don't think that will have much to do with high-power computation.