Comment by gojomo

3 years ago

Interesting! But, it'd be helpful to clarify further the strength of the following claim:

> This demo allows private access to 6 GB (~30%) of English Wikipedia. In theory, even if the server is malicious, it will be unable to learn which articles you request. All article title searches are performed locally, and no images are available.

In this demo, the number of article-titles is relatively small – a few million – & enumerable.

If the server is truly malicious, and it issues itself requests for every known title, does it remain true that this "Private Information Retrieval" (PIR) scheme still gives it no hints that subsequent requests from others for individual articles retrieve particular data?

(Presumably: every request touches every byte of the same full 6GB of data, and involves every such byte in constant-run-time calculations that vary per request, and thus have the effect of returning only what each request wanted – but not at all in any way correlatable with other requests for the exact same article, from the same or different clients?)

11 comments

gojomo

blintz 3 years ago

Indeed! The encryptions of the client queries are fully semantically secure - under relatively solid lattice-based cryptographic assumptions, a server would need to do more than 2^128 work to recover the plaintext of someone’s query. One query for one item in the database is indistinguishable (without the client’s key) from another query for the same item later; in other words, it’s similar to something like the guarantee of CBC or GCM modes, where as long as you use it correctly, it is secure even if the attacker can see many encryptions of its choosing.

throwaway2016a 3 years ago

> without the client’s key
Thank you, these 4 words really helped with my understanding so I'm calling it out incase it helps others. So I was thinking, what prevents you from replaying the query and getting the same page back? But it seems the answer is: that would only produce a gibberish response because you don't have the key.
benlivengood 3 years ago
How indistinguishable is a Not Found result by the server? It seems like user behavior would be to re-request the article a second time, so the client should probably protect the user against this kind of server (which could bisect on article popularity to find the requested article in ~20 tries) by throwing up a warning about an article in the index not being retrievable.
- blintz 3 years ago
  
  Indeed, client behavior when an article fails to be retrieved is very important. A malicious server could guess what article is being retrieved, remove it from the database, and see if the client 'reacts' in some observable way.
  The mitigations for this are somewhat heuristic; the client should perhaps 'pace'/'debounce' subsequent requests when a retrieval fails. For example, enforce that the re-retrieval will happen at a uniformly random time between 1-5 seconds later.
  It's good to note that this kind of attack is somewhat unavoidable, since denial-of-service is (generally) always possible. An interesting mitigation might be for the server to produce a ZKP that the full dot product was computed correctly, and have the client always check the ZKP before displaying the article. This is mostly a theoretical fix for now I believe, since ZKP's over homomorphic computation are in the realm of theory more than practice.
  
  1 reply →
- lmm 3 years ago
  
  I think their model doesn't allow you to express such a thing? At implementation level the client requests an id in a fixed range and gets back a 100KB chunk corresponding to that id; if there's no article for that id then it'll just get a chunk of all 0s, and to anyone without the client's key that's indistinguishable from a chunk with content in it.
danuker 3 years ago
Would this be vulnerable to a side-channel attack as follows?
1. Record what item was retrieved from disk for a query
2. Run a dictionary through the query system, and see which item matches the record
- blintz 3 years ago
  
  The server processing has to scan the entire database to answer a query- so the entire database stays in memory, and access patterns tell you nothing about the query.
  
  1 reply →