Comment by jl6
3 years ago
In another comment you’ve said:
> With a proper implementation of PIR, the server still needs to scan through the entire encrypted dataset (this is unavoidable, otherwise its I/O patterns would leak information)
Is this technique therefore practical only when the server side dataset is relatively small (or full scans for every query are tolerable)?
(edit: sorry, misattributed the quote)
Wasn’t me, but it was accurate!
Indeed, except in some (exciting!) new theoretical constructions, server work is always linear in the database size.
However, I’d emphasize that our work shows that the constabt factor on this linear operation is really high. We process at 300MB/s to 1.9GB/s on a single core, which is fast enough for relatively large databases. Remember that the computation is embarrassingly parallel, so you can always throw more compute at larger databases. To summarize, we think the speed is now fast enough that it really can be feasible to scan the whole database to answer a single query.
> except in some (exciting!) new theoretical constructions
Sorry if this is too much of a tangent, but I would love to know what these are!
They are called PIR with sublinear online computation or offline/online PIR.
Key idea is the client issues a query that is independent of what they really want. This is the “offline” query. This query is linear (unavoidable) and the client gets some hints as a result
Then later, the client can issue one or more queries for elements they want and those queries can be sublinear in terms of computation. The client uses hints to make that happen.
Some papers on this are:
https://eprint.iacr.org/2019/1075.pdf
https://eprint.iacr.org/2021/1438
https://eprint.iacr.org/2022/081.pdf
Sure! This paper has people really excited from a theoretical standpoint: https://eprint.iacr.org/2022/081.pdf. It’s a bit far from practical for single-server, but I’m sure people are working on it.
Clarification: that was my comment, not OP's. I'm not a cryptography expert, just an interested amateur. But my understanding is that O(n) query times are inevitable if you want information-theoretic security. Maybe it's possible to do better with a weaker security property.
And there are clever ways you can make a system like this "scale" even if the overall dataset size is limited. For instance, the authors cite another interesting paper [1] that uses a similar technique to build a fully-private voice chat system. The basic idea seems to be that you build a "database" consisting of the most recent snippet of audio from every ongoing conversation, and let each client privately query for the segment that's addressed to it. And every fraction of a second when new audio data arrives, you just throw away the old database and build a new one, so the amount of data doesn't depend on the length of the conversation.
Even if this doesn't scale to arbitrary numbers of users, it could still be used to run a "cell" of a few thousand people, in which it's not possible for an adversary to reconstruct communication patterns within the cell.
[1]: https://www.usenix.org/conference/osdi21/presentation/ahmad
Maybe the I/O pattern could be hideable using confidential computing, like with a Nitro Enclave.
Maybe, but if you have a secure enclave that can be trusted not to leak data, then you don't really need PIR. You can just have clients encrypt their queries with a key that can only be decrypted by the code running in the enclave.