← Back to context

Comment by GistNoesis

11 hours ago

If it's not done properly, and you happen at any point in the chain to put black blocks on a compressed image (and PDF do compress internal images), you are leaking some bits of information in the shadow casted by the compression algorithm : (Self-plug : https://github.com/unrealwill/jpguncrop )

And that's just in the non-adversarial simple case.

If you don't know the provenance of images you are putting black box on (for example because of a rogue employee intentionally wanting to leak them, or if the image sensor of your target had been compromised to leak some info by another team), your redaction can be rendered ineffective, as some images can be made uncroppable by construction .

(Self-plug : https://github.com/unrealwill/uncroppable )

And also be aware that compression is hiding everywhere : https://en.wikipedia.org/wiki/Compressed_sensing

  • Right, using stenography to encode some parity bits into an image so that lost information can be reconstructed seems like an obvious approach - all sorts of approaches you could use, akin to FEC. Haven't looked at your site yet, will be interested to see what you've built :)

    Edit: I checked it out, nice, I like the lower res stenography approach, can work very nicely with good upscaling filters - gave it a star :)

  • >Let's crop it anyway

    That is not cropping.

    https://en.wikipedia.org/wiki/Cropping_(image)

    >Cropping is the removal of unwanted _outer_ areas from a photographic or illustrated image.

Somewhat related, I once sent a FOI request to a government agency that decided the most secure way to redact documents was to print them, use a permanent marker, and then scan them. Unfortunately they used dye based markers over laser print, so simply throwing the document into Photoshop and turning up the contrast made it readable.

  • I remember noticing that a teacher in high school had used white-out to hide the marks for the correct multiple choice answer on final exam practice questions before copying them. Then she literally cut-and-pasted questions from the practice questions for the final. I did mediocre on the essay, but got the highest score in the class on the multiple choice questions, because I could see little black dots where the white out was used.

I was thinking I understand what's going on but then I came to the image showing the diff and I don't understand at all how that diff can unredact anything.

  • It's not that you can unredact them from scratch (you could never get the blue circle back from this software). It's that you can tell which of the redacted images is which of the origin images. Investigative teams often find themselves in a situation where they have all four images, but need to work out which redacted files are which of the origins. Take for example, where headed paper is otherwise entirely redacted.

    So with this technique, you can definitively say "Redacted-file-A is definitely a redacted version of Origin-file-A". Super useful for identifying forgeries in a stack of otherwise legitimate files.

    Also good for for saying "the date on origin-file-B is 1993, and the file you've presented as evidence is provable as origin-file-b, so you definitely know of [whatever event] in 1993".