Comment by blintz

7 months ago

I say this as a lover of FHE and the wonderful cryptography around it:

While it’s true that FHE schemes continue to get faster, they don’t really have hope of being comparable to plaintext speeds as long as they rely on bootstrapping. For deep, fundamental reasons, bootstrapping isn’t likely to ever be less than ~1000x overhead.

When folks realized they couldn’t speed up bootstrapping much more, they started talking about hardware acceleration, but it’s a tough sell at time when every last drop of compute is going into LLMs. What $/token cost increase would folks pay for computation under FHE? Unless it’s >1000x, it’s really pretty grim.

For anything like private LLM inference, confidential computing approaches are really the only feasible option. I don’t like trusting hardware, but it’s the best we’ve got!

66 comments

blintz

mti 7 months ago

There is an even more fundamental reason why FHE cannot realistically be used for arbitrary computation: it is that some computations have much larger asymptomatic complexity on encrypted data compared to plaintext.

A critical example is database search: searching through a database on n elements is normally done in O(log n), but it becomes O(n) when the search key is encrypted. This means that fully homomorphic Google search is fundamentally impractical, although the same cannot be said of fully homomorphic DNN inference.

blintz 7 months ago
There has been a theoretical breakthrough that makes search a O(log n) problem, actually, (https://eprint.iacr.org/2022/1703) but it is pretty impractical (and not getting much faster).
- mti 7 months ago
  
  Good point. Note however that PIR is a rather restricted form of search (e.g., with no privacy for the server), but even so, DEPIR has polylog(n) queries (not log n), and requires superlinear preprocessing and a polynomial blowup in the size of the database. I think recent concrete estimates are around a petabyte of storage for a database of 2^20 words. So as you say, pretty impractical.

reliabilityguy 7 months ago

Even without bootstrapping FHE will never be as fast as plaintext computation: the ciphertext is about three orders of magnitude much larger than the plaintext data it encrypts, which means you have to have more memory bandwidth and more compute. You can’t bridge this gap.

blintz 7 months ago
Technically, there are rate-1 homomorphic encryption schemes, where ‘rate’ refers to the size ratio between the plaintext and the ciphertext. They’re not super practical, so your general point stands.
- reliabilityguy 7 months ago
  
  Oh, interesting. Can you point to a paper about one?
  
  2 replies →
paulgerhardt 7 months ago
That actually sounds pretty reasonable and feels almost standard at this point?
To pick one out of a dozen possible examples: I regularly read 500 word news articles from 8mb web pages with autoplaying videos, analytics beacons, and JS sludge.
That’s about 3 orders of magnitude for data and 4-5 orders of magnitude for compute.
- reliabilityguy 7 months ago
  
  Sure, but downloading a lot of data is not the same as compute on this data. With web you simply download the data, and pass the pointers to this data around. With FHE, you have to compute on extremely large cipher texts, using every byte of them. FHE is roughly 1000x more data to process and it takes about 1000x more time.
- TechDebtDevin 7 months ago
  
  I dont remember the last time I saw a news page that was <50mb
  
  2 replies →

ipnon 7 months ago

Don't you think there is a market for people who want services that have provable privacy even if it costs 1,000 times more? It's not as big a segment as Dropbox but I imagine it's there.

PeterisP 7 months ago

FHE solves privacy-from-compute-provider and doesn't affect any other privacy risks of the services. The trivial way to get privacy from the compute provider is to run that compute yourself - we delegate compute to cloud services for various reasonable efficiency and convenience reasons, but a 1000-fold less efficient cloud service usually isn't competitive with just getting a local device that can do that.
poly2it 7 months ago
???
For the equivalent of $500 in credit you could self host the entire thing!
- haiku2077 7 months ago
  
  You're not joking. If you're like most people and have only a few TiB of data in total, self hosting on a NAS or spare PC is very viable. There are even products for non-technical people to set this up (e.g. software bundled with a NAS). The main barrier is having an ISP with a sufficient level of service.
  
  17 replies →
- drcolly 7 months ago
  
  The statements made in the linked description of this cannot be true, such as Google not being able to read what you sent them and not being able to read what they responded with.
  Having privacy is a reasonable goal, but VPNs and SSL/TLS provide enough for most, and at some point your also just making yourself a target for someone with the power to undo your privacy and watch you more closely- why else would you go through the trouble unless you were to be hiding something? It’s the same story with Tor, VPN services, etc.- those can be compromised at will. Not to say you shouldn’t use them if you need to have some level of security functionally, but no one with adequate experience believes in absolute security.
  
  5 replies →
bawolff 7 months ago
If we are talking 1000x more latency, that is a pretty hard sell.
Something that normally takes 30 seconds now takes over 8 hours.
- hoppp 7 months ago
  
  Its like, python can be 400 times slower than C++, but people still use it.
  
  6 replies →
- moffkalast 7 months ago
  
  Or more like, something that normally takes 50ms like a http request, would take a minute.
landl0rd 7 months ago

For LLM inference, the market that will pay $20,000 for what is now $20 is tiny.
mahmoudimus 7 months ago

there is, it's called governments. however this technology is so slow that using it in mission critical systems (think communication / coordinates during warfare) that it is not feasible IMO.
the parent post is right, confidential compute is really what we've got.
taeric 7 months ago

Honestly, no? Unless you get everyone using said services, then a market that is only viable to people trying to hide bad behavior becomes the place you look for people doing bad things?
This is a large part of why you have to convince people to hide things even if "they have nothing to hide."
oakwhiz 7 months ago

For most this would mean only specially treating a subset of all the sensitive data they have.

txdv 7 months ago

I get that there is a big LLM hype, but is there really no other application for FHE? Like for example trading algorithms (not the high speed once) that you can host on random servers knowing your stuff will be safe or something similar?

seanhunter 7 months ago
I speak as someone who used to build trading algorithms (not the high speed ones) for a living for several years, so knows that world pretty well. I highly doubt anyone who does that will host their stuff on random servers even if you had something like FHE. Why? Because it's not just the code that is confidential.
1) if you are a registered broker dealer you will just incur a massive amount of additional regulatory burden if you want to host this stuff in any sort of "random server"
2) Whoever you are, you need the pipe from your server to the exchange to be trustworthy, so no-one can MITM your connection and front-run your (client's) orders.
3) This is an industry where when people host servers in something like an exchange data center it's reasonably common to put them in a locked cage to ensure physical security. No-one is going to host on a server that could be physically compromised. Remember that big money is at stake and data center staff typically aren't well paid (compared to someone working for an IB or hedge fund), so social engineering would be very effective if someone wanted to compromise your servers.
4)Even if you are able to overcome #1 and are very confident about #2 and #3, even for slow market participants you need to have predictable latency in your execution or you will be eaten for breakfast by the fast players[1]. You won't want to be on a random server controlled by anyone else in case they suddenly do something that affects your latency.
[1] For example, we used to have quite slow execution ability compared with HFTs and people who were co-located at exchanges, so we used to introduce delays when we routed orders to multiple exchanges so the orders would arrive at their destinations at precisely the same time. Even though our execution latency was high, this meant no-one who was colocated at the exchange could see the order at one exchange and arb us at another exchange.
- darkwater 7 months ago
  
  But shouldn't proper FHE address most of these concerns? I mean, most of those extra measures are exactly because if you can physically access the server, it's game over. With FHE, if the code is trusted, even tampering with the hardware should not compromise the software.
  
  3 replies →
toolslive 7 months ago

I encountered the situation where one company had the data, and considered this to be really valuable and did not want to show/share it. Another company had a model, which was very considered very valuable and did not want to show it. So they were stuck in a catch22. Eventually they solved the perceived risk via contracts, but it could have been solved technically if FHE were viable.

benlivengood 7 months ago

I think the only thing that could make FHE truly world-changing is if someone figures out how to implement something like multi-party garbled circuits under FHE where anyone can verify the output of functions over many hidden inputs since that opens up a realm of provably secure HSMs, voting schemes, etc.

tonetegeatinst 7 months ago

I'd also like to comment on how everything used to be a PCIE expansion card.

Your GPU was, and we also used to have dedicated math coprocessor accelerators. Now most of the expansion card tech is all done by general purpose hardware, which while cheaper will never be as good as a custom dedicated silicon chip that's only focused on 1 task.

Its why I advocate for a separate ML/AI card instead of using GPU's. Sure their is hardware architecture overlap but your sacrificing so much because your AI cards are founded on GPU hardware.

I'd argue the only AI accelerators are something like what goes into modern SXM (sockets). This ditches the power issues and opens up more bandwidth. However only servers have the sxm sockets....and those are not cheap.

pxeger1 7 months ago

> most of the expansion card tech is all done by general purpose hardware, which while cheaper will never be as good as a custom dedicated silicon chip that's only focused on 1 task
I think one reason they can be as good as or better than dedicated silicon is that they can be adjusted on the fly. If a hardware bug is found in your network chip, too bad. If one is found in your software emulation of a network chip, you can update it easily. What if a new network protocol comes along?
Don't forget the design, verification, mask production, and other one-time costs of making a new type of chip are immense ($millions at least).
> Its why I advocate for a separate ML/AI card instead of using GPU's. Sure their is hardware architecture overlap but your sacrificing so much because your AI cards are founded on GPU hardware.
I think you may have the wrong impression of what modern GPUs are like. They may be descended from graphics cards (as in graphics ), but today they are designed fully with the AI market in mind. And they are design to strike an optional balance between fixed functionality for super-efficient calculations that we believe AI will always need, and programmability to allow innovation in algorithms. Anything more fixed would be unviable immediately because AI would have moved on by the time it could hit the market (and anything less fixed would be too slow).

asah 7 months ago

Thx! I'm curious about your thoughts...

- FHE for classic key-value stores and simple SQL database tables?

- the author's argument that FHE is experiencing accelerated Moore's law, and therefore will close 1000x gap quickly?

Thx!

deknos 7 months ago

From your perspective: which FHE is actually usable? Or is only PHE actually usable?

Tryk 7 months ago

Interesting! Can you provide some sources for this claim?