Comment by paxys

10 months ago

So basically - You take a picture. Apple encrypts it and uploads it to their server. The server matches the (still encrypted) picture to a database and tells your device "this picture contains the Eiffel Tower". Later when you search for Eiffel Tower on your device the photo pops up.

Is the complexity and security risk really worth it for such a niche feature?

It's also funny that Apple is simultaneously saying "don't worry the photo is encrypted so we can't read it" and "we are extracting data from the encrypted photo to enhance your experience".

18 comments

paxys

Arn_Thor 10 months ago

They don’t send the photo. They send some encrypted metadata to which some noise is added. The metadata can be loosely understood as “I have this photo that looks sort of like this”. Then the server takes that encrypted data from the anonymized device and responds something like “that looks like the Eiffel Tower” and sends it back to the device. The actual photo never goes to the server.

dwaite 10 months ago

With the added caveat that HE is magic sauce - so the server cannot see the metadata (cropped/normalized image data), and doesn't know how much it does or does not look like the Eiffel Tower.

sinuhe69 10 months ago

They don’t send the photos. Nobody sent your photos anywhere but only certain meta data and its similarity vectors for matching purpose.

goodluckchuck 10 months ago

How cannot tell the picture contains the Eiffel Tower if the image is not decrypted?

ls612 10 months ago
Because it turns out that mathematicians and computer scientists have devised schemes that allow for certain computational operations to be performed on encrypted data without revealing the data itself. You can do a+b=c and it doesn’t reveal anything about what a and b are is the intuition here. This has been mostly confined to the realm of theory and mathematics until very recently but Apple has operationalized it for the first time.
- Klonoar 10 months ago
  
  It’s not the first time Apple operationalized it, they did it for Caller ID awhile back.
- CodeWriter23 10 months ago
  
  And then when the system does the computation to determine your location (wait.what?)
  
  4 replies →

RandallBrown 10 months ago

Is this a niche feature? I use this kind of search very often in my photos.

aeyes 10 months ago
What are some example keywords? I have never searched for landmarks, I only search for location.
How many landmarks are there to justify sending your data off? Can't the database be stored on the device?
- sprayk 10 months ago
  
  cat, pizza, dog, red car, Tokyo, dolphins, garden
  The usual context is "oh I saw some dolphins forever ago. let me see if I can find the photos..."
  
  1 reply →
- mp05 10 months ago
  
  One time in Paris I received a crème brûlée that strongly resembled the face of Jesus Christ, so naturally I took a picture before devouring it.
  Last night I was able to find the image using the single word "creme". Definitely saved the story.

IncreasePosts 10 months ago

Not really. It's more like apple runs a local algorithm that takes your picture of the Eiffel tower, and outputs some text "Eiffel tower, person smiling", and then encrypts that text and sends it securely to apples servers to help you when you perform a search.

internetter 10 months ago

OP was wrong, but this is even wronger
Locally, a small ML model identifies potential POIs in an image.
Another model turns these regions into a series of numbers (a vector) that represent the image. For instance, one number might correlate with how "skyscraper-like" the image is. (We don't actually know the definition of each dimension of the vector, but we can turn an image that we know is the eiffel tower into a vector, and measure how closely our reference image and our sample image are located)
The thing is, we aren't storing this database with the vectors of all known locations on our phone. We could send the vector we made on device off to Apple's servers. The vector is lossy, after all, so apple wouldn't have the image. If we did this, however, apple would know that we have an image of the eiffel tower.
So, this is the magic part. The device encrypts the vector using a private key known only to it, then sends this unreadable vector off to the server. Somehow, using Homomorphic Encryption and other processes I do not understand, mathematical operations like cosine similarity can be applied to this encrypted vector without reading the actual contents of the vector. Each one of these operations changes the value, which is still encrypted, but we do not know how the value changed.
I don't know if this is exactly what Apple does, I think they have more efficient ways, but theoretically what you could do is apply each row in your database to this encrypted value, in such a way that the encrypted value becomes the name of the POI of the best match, or otherwise junk is appended (completely changing the encrypted value) Again, the server has not read the encrypted value, it does not know which row won out. Only the client will know when it decrypts the new value.