← Back to context

Comment by paxys

1 month ago

So basically - You take a picture. Apple encrypts it and uploads it to their server. The server matches the (still encrypted) picture to a database and tells your device "this picture contains the Eiffel Tower". Later when you search for Eiffel Tower on your device the photo pops up.

Is the complexity and security risk really worth it for such a niche feature?

It's also funny that Apple is simultaneously saying "don't worry the photo is encrypted so we can't read it" and "we are extracting data from the encrypted photo to enhance your experience".

They don’t send the photo. They send some encrypted metadata to which some noise is added. The metadata can be loosely understood as “I have this photo that looks sort of like this”. Then the server takes that encrypted data from the anonymized device and responds something like “that looks like the Eiffel Tower” and sends it back to the device. The actual photo never goes to the server.

  • With the added caveat that HE is magic sauce - so the server cannot see the metadata (cropped/normalized image data), and doesn't know how much it does or does not look like the Eiffel Tower.

They don’t send the photos. Nobody sent your photos anywhere but only certain meta data and its similarity vectors for matching purpose.

How cannot tell the picture contains the Eiffel Tower if the image is not decrypted?

  • Because it turns out that mathematicians and computer scientists have devised schemes that allow for certain computational operations to be performed on encrypted data without revealing the data itself. You can do a+b=c and it doesn’t reveal anything about what a and b are is the intuition here. This has been mostly confined to the realm of theory and mathematics until very recently but Apple has operationalized it for the first time.

Is this a niche feature? I use this kind of search very often in my photos.

  • What are some example keywords? I have never searched for landmarks, I only search for location.

    How many landmarks are there to justify sending your data off? Can't the database be stored on the device?

    • cat, pizza, dog, red car, Tokyo, dolphins, garden

      The usual context is "oh I saw some dolphins forever ago. let me see if I can find the photos..."

      1 reply →

    • One time in Paris I received a crème brûlée that strongly resembled the face of Jesus Christ, so naturally I took a picture before devouring it.

      Last night I was able to find the image using the single word "creme". Definitely saved the story.

Not really. It's more like apple runs a local algorithm that takes your picture of the Eiffel tower, and outputs some text "Eiffel tower, person smiling", and then encrypts that text and sends it securely to apples servers to help you when you perform a search.

  • OP was wrong, but this is even wronger

    Locally, a small ML model identifies potential POIs in an image.

    Another model turns these regions into a series of numbers (a vector) that represent the image. For instance, one number might correlate with how "skyscraper-like" the image is. (We don't actually know the definition of each dimension of the vector, but we can turn an image that we know is the eiffel tower into a vector, and measure how closely our reference image and our sample image are located)

    The thing is, we aren't storing this database with the vectors of all known locations on our phone. We could send the vector we made on device off to Apple's servers. The vector is lossy, after all, so apple wouldn't have the image. If we did this, however, apple would know that we have an image of the eiffel tower.

    So, this is the magic part. The device encrypts the vector using a private key known only to it, then sends this unreadable vector off to the server. Somehow, using Homomorphic Encryption and other processes I do not understand, mathematical operations like cosine similarity can be applied to this encrypted vector without reading the actual contents of the vector. Each one of these operations changes the value, which is still encrypted, but we do not know how the value changed.

    I don't know if this is exactly what Apple does, I think they have more efficient ways, but theoretically what you could do is apply each row in your database to this encrypted value, in such a way that the encrypted value becomes the name of the POI of the best match, or otherwise junk is appended (completely changing the encrypted value) Again, the server has not read the encrypted value, it does not know which row won out. Only the client will know when it decrypts the new value.