Comment by OneMorePerson

10 hours ago

It's funny seeing this play out because in my personal life anytime I'm sharing a sensitive document where someone needs to see part of it but I don't want them to see the rest that's not relevant, I'll first block out/redact the text I don't want them to see (covering it, using a redacting highlighter thing, etc.), and then I'll screenshot the page and make that image a PDF.

I always felt paranoid (without any real evidence, just a guess) that there would always be a chance that anything done in software could be reversed somehow.

If it's not done properly, and you happen at any point in the chain to put black blocks on a compressed image (and PDF do compress internal images), you are leaking some bits of information in the shadow casted by the compression algorithm : (Self-plug : https://github.com/unrealwill/jpguncrop )

  • And that's just in the non-adversarial simple case.

    If you don't know the provenance of images you are putting black box on (for example because of a rogue employee intentionally wanting to leak them, or if the image sensor of your target had been compromised to leak some info by another team), your redaction can be rendered ineffective, as some images can be made uncroppable by construction .

    (Self-plug : https://github.com/unrealwill/uncroppable )

    And also be aware that compression is hiding everywhere : https://en.wikipedia.org/wiki/Compressed_sensing

    • Right, using stenography to encode some parity bits into an image so that lost information can be reconstructed seems like an obvious approach - all sorts of approaches you could use, akin to FEC. Haven't looked at your site yet, will be interested to see what you've built :)

      Edit: I checked it out, nice, I like the lower res stenography approach, can work very nicely with good upscaling filters - gave it a star :)

      2 replies →

  • Somewhat related, I once sent a FOI request to a government agency that decided the most secure way to redact documents was to print them, use a permanent marker, and then scan them. Unfortunately they used dye based markers over laser print, so simply throwing the document into Photoshop and turning up the contrast made it readable.

    • I remember noticing that a teacher in high school had used white-out to hide the marks for the correct multiple choice answer on final exam practice questions before copying them. Then she literally cut-and-pasted questions from the practice questions for the final. I did mediocre on the essay, but got the highest score in the class on the multiple choice questions, because I could see little black dots where the white out was used.

  • I was thinking I understand what's going on but then I came to the image showing the diff and I don't understand at all how that diff can unredact anything.

    • It's not that you can unredact them from scratch (you could never get the blue circle back from this software). It's that you can tell which of the redacted images is which of the origin images. Investigative teams often find themselves in a situation where they have all four images, but need to work out which redacted files are which of the origins. Take for example, where headed paper is otherwise entirely redacted.

      So with this technique, you can definitively say "Redacted-file-A is definitely a redacted version of Origin-file-A". Super useful for identifying forgeries in a stack of otherwise legitimate files.

      Also good for for saying "the date on origin-file-B is 1993, and the file you've presented as evidence is provable as origin-file-b, so you definitely know of [whatever event] in 1993".

      2 replies →

I'll just send an image and not bother with a PDF.

(Note there's also other metadata in a PDF, which you may not want your recipient to know either.)

  • There's also metadata in the image files. What specifically would be sensitive in the pdf with screenshots metadata that is also not present in the sceenshot image metadata?

    • PDF has something called an "info dictionary", which most mainstream PDF-writing software will fill out with various bits of info that you might not want known.

      Image files usually have substantially less metadata by default, unless it's one taken by a camera.

      1 reply →

I learned that a long time ago when I was a student and wanted to submit a pdf generated by a trial version of some software as an assignment and was trying to be clever and cover the watermark that said unregistered with a white box.

When opening the file in my slow computer, I could see all the rendering of the watermark happening in slow motion until the white box would pop up on top of the text.

  • When I was a student, and using a shareware or trial version of some software and wanted some printed output from it without a watermark, I printed to postscript (chose a printer that supported postscript and the driver used it instead of rasterized images), but using a file instead of a printer.

    I could then open up the postscript, delete the commands that rendered the watermark, save it, then I converted it to PDF so it would be easy to print.

  • It's actually quite easy to open the pdf and see that there are several different elements per page to the document, eg the main text, an image, the footer, the title.

    Randomly removing these by trial and error will usually quite easily allow you to find the watermark and nix it, with the advantage that even a sophisticated recipient will not be able to find out from the pdf file what the watermark was.

Maybe the person tasked with the redacting didn't agree so they chose the worst possible way to do it.

  • Normally, I'd never attribute to intention what can be blamed on incompetence. Especially if the government is doing it. But sure, if I were the intern tasked with this job...

    • > Especially if the government is doing it.

      Also if doing it right means more work?

it's absolutely bewildering how ridiculous everything has been so far in terms of competence and this really takes the cherry on the top near Christmas too.

how much lower can they go ?!

  • USA is still very high, so they can go much much lower, but I think they might go to some still lower places, finding them where we didn't even know such places could exist. Some ideas:

    - Leave NATO

    - Start openly supporting Russia and North Korea

    - Arrest whole International Criminal Court

    - Preventively invade China

  • This low https://en.wikipedia.org/wiki/Child_abuse_in_Pakistan aka a society where child abuse is simply accepted and mainstream, with the child abuse of child labour and dhijhadism being just additional nightmare fuel on top.

    • If we survive long enough I do believe historians will look back on this period and state as a matter of fact, rape and child abuse were completely acceptable, because it seems it’s totally fine with our elected leaders. If these leaders were democratically elected there is only one conclusion to draw from it…

  • I'm not too concerned about the US. They've made their bed.

    I'm more concerned with them dragging everyone else down, and someone much worse taking their place.

Personally, I only trust an image manipulation tool to put down solid colored blocks, or something that does not involve the source pixels when deciding on the redacted pixel. Formats like PDF are just so complicated to trust.

The one that was crazy to me is undoing a blur effect (based on its algo), so yeah I also will layer and screenshot something

This is what I do while sharing such images. I crop out those parts first and then take another screenshot. I do not even risk painting over and then take another screenshot. I have been doing this forever.

In practical terms, a more convenient way to achieve this is just printing the document to a PDF, which rasterises the visible layer into what the printer would see. Most pdf tools support this.

  • That seems like a dangerous approach. Though printer drivers do often use rasterization, especially when targeting cheap printers, many printers can render vector graphics and text as well. Print-to-PDF will often use the later approach, unless of course the source program always rasterizes it's output when sending it out to the printer driver, or the used Print-to-PDF driver is particularly stupid.

I then convert the image to grayscale only. Then I apply a filter so that only 16 colors are used. And I then adjust brightness/contrast so that "white is really white". It's all scripted: "screenshot to PDF". One of my oldest shell script.

16 shades of grey (not 50) is plenty enough for text to still be smooth.

I do it for several reasons, one of them being I often take manual notes on official documents (which infuriates my wife btw) but then sometimes I need to then scan the documents and send them (local IRS / notary / bank / whatever). So I'll just scan then I'll fill rectangle with white where I took handnotes. Another reason is when there's paper printed on two sides, at scan times sometimes if the paper is thin / ink is thick, the other side shall show.

I wonder how that'd work vs adversarial inputs: never really thought about it.