← Back to context

Comment by jl6

3 years ago

In another comment you’ve said:

> With a proper implementation of PIR, the server still needs to scan through the entire encrypted dataset (this is unavoidable, otherwise its I/O patterns would leak information)

Is this technique therefore practical only when the server side dataset is relatively small (or full scans for every query are tolerable)?

(edit: sorry, misattributed the quote)

Wasn’t me, but it was accurate!

Indeed, except in some (exciting!) new theoretical constructions, server work is always linear in the database size.

However, I’d emphasize that our work shows that the constabt factor on this linear operation is really high. We process at 300MB/s to 1.9GB/s on a single core, which is fast enough for relatively large databases. Remember that the computation is embarrassingly parallel, so you can always throw more compute at larger databases. To summarize, we think the speed is now fast enough that it really can be feasible to scan the whole database to answer a single query.

Clarification: that was my comment, not OP's. I'm not a cryptography expert, just an interested amateur. But my understanding is that O(n) query times are inevitable if you want information-theoretic security. Maybe it's possible to do better with a weaker security property.

And there are clever ways you can make a system like this "scale" even if the overall dataset size is limited. For instance, the authors cite another interesting paper [1] that uses a similar technique to build a fully-private voice chat system. The basic idea seems to be that you build a "database" consisting of the most recent snippet of audio from every ongoing conversation, and let each client privately query for the segment that's addressed to it. And every fraction of a second when new audio data arrives, you just throw away the old database and build a new one, so the amount of data doesn't depend on the length of the conversation.

Even if this doesn't scale to arbitrary numbers of users, it could still be used to run a "cell" of a few thousand people, in which it's not possible for an adversary to reconstruct communication patterns within the cell.

[1]: https://www.usenix.org/conference/osdi21/presentation/ahmad

Maybe the I/O pattern could be hideable using confidential computing, like with a Nitro Enclave.

  • Maybe, but if you have a secure enclave that can be trusted not to leak data, then you don't really need PIR. You can just have clients encrypt their queries with a key that can only be decrypted by the code running in the enclave.