Comment by lucb1e

3 years ago

Is it not possible to determine which article(s) the user downloaded based on the memory locations read? Of course, multiple small articles from within the same 100KB cannot be said, but for any medium to large article, you'd be able to make a good guess (if there are a handful of articles there) or an exact match (if there is <=1 article in that chunk) no?

Or does the server go through a large chunk of its memory (say, at least a quarter of all of Wikipedia) and perform some oblivious computation on all of that data (applying the result modulo this 100KB return buffer)? That sounds very resource-intensive, at least for something large like Wikipedia (a doctor's office with some information pages of a few KB each could more easily do such a thing).

In the latter case, is each request unique (does it involve some sort of IV that the client can xor out of the data again) or could an index be built similar to a list of hashed PIN codes mapped back to plain text numbers?

Edit: I had already read some comments but just two comments further would have been my answer... :) https://news.ycombinator.com/item?id=31669924

> One query for one item in the database is indistinguishable (without the client’s key) from another query for the same item later; in other words, it’s similar to something like the guarantee of CBC or GCM modes, where as long as you use it correctly, it is secure even if the attacker can see many encryptions of its choosing.

That is some cool stuff indeed. I'm going to have to up my game when building or reviewing privacy-aware applications. Sure, a file sharing service is not going to practically allow this, but I'm sure that with this knowledge, I will come across places where it makes sense from both a usefulness (e.g. medical info) and practicality (data set size) perspective.

2 comments

lucb1e

MauranKilom 3 years ago

> Sure, a file sharing service is not going to practically allow this

Well, as the author points out here [0], it doesn't actually translate to exorbitant cost increases when almost all the cost is in bandwidth rather than compute. A file sharing service rather seems like an ideal example (assuming you can derive additional revenue from the privacy promises).

[0]: https://news.ycombinator.com/item?id=31673122

lucb1e 3 years ago

I'd be interested in the calculation there. How many movies does the server store in the first place? Computing over all of Netflix (just their movies, not even the series) for every request, even if you could then obtain the full movie with just one request, seems almost prohibitive. I'm also assuming the author counted on the server being busy the whole time (on-demand pricing) and not having to have spare capacity idling for when multiple requests come in at the same time.
For me, compute has always been by far my expense, and that's without homomorphic encryption! Buying a server or the monthly rental (e.g. VPS) costs is where most of my budget goes. Next on the expense list are a few domain names, and probably about as much in electricity if we're considering the at-home scenario. Bandwidth is usually included, be it with the VPS or with the server I run at home (since I want that internet uplink for normal use anyway).