Unless I'm missing something, those are just theoretical examples of how one could potentially deliberately try to find hash collisions, using a different, simpler perceptual hash function: https://twitter.com/matthew_d_green/status/14230842449522892...
So, it's theoretical, it's a different algorithm, and it's a case where someone is specifically trying to find collisions via machine learning. (Perhaps by "reversing" the hash back to something similar to the original content.)
The two above posters claim that they saw cases where there was a false positive match from the actual official CSAM hash algorithm on some benign files that happened to be on a hard drive; not something deliberately crafted to collide with any hashes.
You're not missing something, but you're not likely to get real examples because as I understand it the algorithm and database are private, the posters above are just guardedly commenting with (claimed) insider knowledge, they're not likely to want to leak examples (and not just that it's private, but with the supposed contents.. Would you really want to be the one saying 'but it isn't, look'? Would you trust someone who did, and follow such a link to see for yourself?)
To be clear, I definitely didn't want examples in terms of links to the actual content. Just a general description. Like, was a beach ball misclassified as a heinous crime, or was it perfectly legal consensual porn with adults that was misclassified, or was it something that even a human could potentially mistake for CSAM. Or something else entirely.
I understand it seems like they don't want to give examples, perhaps due to professional or legal reasons, and I can respect that. But I also think that information is very important if they're trying to argue a side of the debate.
Unless I'm missing something, those are just theoretical examples of how one could potentially deliberately try to find hash collisions, using a different, simpler perceptual hash function: https://twitter.com/matthew_d_green/status/14230842449522892...
So, it's theoretical, it's a different algorithm, and it's a case where someone is specifically trying to find collisions via machine learning. (Perhaps by "reversing" the hash back to something similar to the original content.)
The two above posters claim that they saw cases where there was a false positive match from the actual official CSAM hash algorithm on some benign files that happened to be on a hard drive; not something deliberately crafted to collide with any hashes.
You're not missing something, but you're not likely to get real examples because as I understand it the algorithm and database are private, the posters above are just guardedly commenting with (claimed) insider knowledge, they're not likely to want to leak examples (and not just that it's private, but with the supposed contents.. Would you really want to be the one saying 'but it isn't, look'? Would you trust someone who did, and follow such a link to see for yourself?)
To be clear, I definitely didn't want examples in terms of links to the actual content. Just a general description. Like, was a beach ball misclassified as a heinous crime, or was it perfectly legal consensual porn with adults that was misclassified, or was it something that even a human could potentially mistake for CSAM. Or something else entirely.
I understand it seems like they don't want to give examples, perhaps due to professional or legal reasons, and I can respect that. But I also think that information is very important if they're trying to argue a side of the debate.
1 reply →