Comment by tjmc
4 years ago
So that picture of my driver's license I took for an ID check or that sensitive work document I scanned with my phone are just as likely to be sent? Great.
4 years ago
So that picture of my driver's license I took for an ID check or that sensitive work document I scanned with my phone are just as likely to be sent? Great.
The image would need to be vaguely similar in terms of gross shapes and arrangement. It's exceedingly unlikely that any CSAM would ever be remotely similar to an ID card or a sheet of paper.
If there are ever going to be any "natural" matches to any CSAM hashes, it's probably going to be a photograph of people who are coincidentally in a similar pose at a nearly identical angle and strikingly shading.
In the myriad of articles about this systems many issues there have been comments from people who have worked with the NCMEC upstream database and note that it's filled with mundane photos, empty rooms, etc - I think it was in one of the hackerfactor article discussions
This entire system is ripe for false positives AND adversarial attacks.
I've no doubt the totality of the database contains a lot of photos, but only photos tagged as A1, A2, B1, or B2 would be considered illegal to possess. And then only the absolute worst of the worst (images categorised as "A1") are being included in the hash set on iOS. The category definitions are:
The categories are described in further detail (ugh) in this PDF, page 22: https://www.prosecutingattorneys.org/wp-content/uploads/Pres...
The NCMEC database is large and graded to distinguish types of photos. There’s evidence in the false positive calculations that Apple is only using a subset, presumably the one where photos are graded as depicting active abuse.
It’s not reasonable to dispute the 1 in 1e12 false positive claim on mere speculation.
5 replies →
The chance that any pictures from your library are revealed at all is at most one in one trillion (mod you not storing CSAM or being attacked by someone trying to plant evidence on you). Contrast this to a server side scanning system where every photo in your library will be accessed with unknown false positive characteristics.