← Back to context

Comment by vishnumohandas

5 years ago

> So you're going to implement algorithms then?

Yes, we will implement the algorithms, purely on the client side, such that we don't hold indexes to your personal data.

But I understand how that piece of text could have thrown you off, I'll think of ways to rephrase it. Thanks for pointing it out.

Actually I'm really curious how you do this. If the photos aren't stored client side, then how do you search? Do you have a thumbnail of every photo client side? Is that enough? I mean ImageNet scores are still pretty low for small/fast neural nets. And ImageNet isn't even representative of real world photos. So obviously to be successful you're going to have to continue training. So how do you do this in a privacy preserving way? Even federated learning can have some issues because images can be reconstructed from gradients.

  • > Do you have a thumbnail of every photo client side

    In the happy path the files/thumbnails are indexed before they are uploaded. But we are designing a framework that will pull files/thumbnails for indexing if they are unindexed or indexed by older models.

    > how do you do this in a privacy preserving way

    Our accuracy will not match that offered by services who index your data on their servers. But there's a trade off between user experience and privacy here, and we are hopeful that ente will be a viable option for an audience who is willing to sacrifice a bit of one for a lot of the other.

    • As someone who has worked on systems like these let me translate:

      “You stuff will be private but in return accuracy will be so bad that the UX is gonna suck!”

      That’s the key piece people miss when they wanna do anything with ML…that’s it’s a different problem compared to writing code because it’s not about the code anymore, it’s about having great training data!

      9 replies →

    • So I guess there is more to the question that I'm asking.

      > Our accuracy will not match that offered by services who index your data on their servers. But there's a trade off between user experience and privacy here,

      I think most people here understand that[0]. We are on Hacker News after all and not Reddit or a more general public place. The concern isn't that you are worse. The concern is that your product has to advance and get better over time. That mechanism is unclear and potentially concerning. The answer to this is the answer to how you ensure continued privacy.

      You talk about the "push files/thumbnails for indexing" and this is what is most concerning to me and at the heart of my original question. How are you collecting those photos for _your_ training set? Obviously this isn't just ImageNet (dear god I hope not). Are you creating your own JFT-300M? Where are those photos being sourced from? What's the bias in that dataset? Obviously there are questions about the model too (CNNs and Transformers have different types of biases and see images differently). But that's a bigger question of training methods and that gets complicated and nuanced fast. Obviously we know there is going to be some distillation going on.

      There's a lot of concerns here and questions that won't really get asked of people that aren't pushing privacy based apps. But the biggest question is how you get feedback into your model and improve it. Non-privacy preserving apps are easier in this respect because you know what (real world) examples you're failing on. But privacy preserving methods don't have this feedback mechanism. We know homomorphic encryption isn't there yet and we know there are concerns with federated learning (images can be recreated from gradients). So the question is: how are you going to improve your model in a privacy preserving method?

      [0] I think people also understand that on device NNs are going to be worse than server side NNs since there's a huge difference in the number of parameters and throughput between these and phone hardware can only do so much.

      11 replies →

You can run algorithms locally and still violate privacy by uploading private facts derived from the data with algorithms. Saying you won’t hold “indexes” doesn’t begin to cover it.

But that will mean that for every version of the algorithms, it have to read all the photos since 15 years ago... my phone battery will die soon.

And if I need to have other kind of client... like a nas to do that... Why I need the cloud?

  • > phone battery will die soon

    Indexing will be opt-in. You will be able to run the indexing only on your desktop client for instance.

    > Why I need the cloud?

    So that you don't have to manage your own storage infrastructure? But if you would like to do that, then there are self-hosted alternatives that will better serve your use case.

Agree with the above poster. I don't care about algorithms. I want algorithms. But I want algorithms that only work for me. Screw off everyone else.

Apple used to sell this. Then they stopped.