Comment by rkagerer
7 hours ago
Never trust a lawyer with a redact tool any more complicated than a marker.
I've seen lawyers at major, high-priced law firms make this same mistake. Once it was a huge list of individuals names and bank account balances. Fortunately I was able to intervene just before the uploaded documents were made public.
Folks around here blame incompetence, but I say the frequency of this kind of cock-up is crystal clear telemetry telling you the software tools suck.
If the software is going to leverage the familiarity of using a blackout marker to give you a simple mechanism to redact text, it should honour that analogy and work the way any regular user would expect, by killing off the underlying text you're obscuring, and any other correponding, hidden bits. Or it should surface those hidden bits so you can see what could come back to bite you later. E.g. It wouldn't be hard to make the redact tool simultaneously act as a highlighter that temporarily turns proximate text in the OCR layer a vibrant yellow as you use it.
It often comes down to not using the right software and training issues. They have to use Acrobat, which has a redaction tool. This is expensive so some places cheap out on other tools that don’t have a real redaction feature. They highlight with black and think it does the same thing whereas the redaction tool completely removes the content and any associated metadata from the document.
This was basically the only reason we were willing to cough up like $400 for each Acrobat license for a few hundred people. One redaction fuckup could cost you whatever you saved by buying something else.
I would like to believe that the DOJ lacking the proper software might have something to do with DOGE. That would be sweet irony.
I think someone just got free marketing materials to promote the redaction tools.
Now much more people will be aware of the issue.
This is to be expected from an effort like DOGE simply because the E is for Efficiency. That is, how well a system is performing. The ratio of energy input to output.
Unfortunately the E in DOGE should have been for Effectiveness. That is, is the system shooting at the right target, and how close is it to hitting that target.
You can be very efficient but if you’re doing the wrong thing(s) you’re ultimately wasting resources.
The irony is, DOGE got the E wrong. It’s efficient but not effective
not even, anyone still left at DOJ working to protect the president is immensely corrupt, and this is just that careless stupidity that typically goes along with deeply corrupt people.
Are you saying that only Adobe PDF has proper redaction tools? I did a quick search and found several open source PDF tools claiming to do redaction- are they all faulty? I would honestly be surprised if there aren't any free tools that do it right.
No that's not what GP is saying. GP is saying that there is software that does not have a redaction feature (perhaps because the developer didn't implement it), but users of the software worked around it by adding a black rectangle to the PDF in such software, falsely believing it to be equivalent to redaction.
Properly implementing redaction is a complicated task. The redaction can be applied to text, so the software needs to find out which text is covered by the rectangle and remove it. The redaction can be applied to images, so the software needs to edit a dizzying array of image formats supported by PDF (including some formats frequently used by PDFs but used basically nowhere else, like JBIG2). The redaction can be applied to invisible text (such as OCR text of a scanned document). The redaction can be applied to vector shapes, so some moderately complicated geometry calculations are needed to break the vector shapes and partially delete them. It's very easy to imagine having a basic PDF editor that does not have a redaction feature because implementing the feature is hard.
> Folks around here blame incompetence, but I say the frequency of this kind of cock-up is crystal clear telemetry telling you the software tools suck.
Absolutely. They know this is confusing, and they're bound and determined not to fix it. At the least, they need a pop-up to let you know that it's not doing what you might think it's doing.
Apple’s Preview app does exactly that. I discovered this while trying to make a blanked copy of kid #2’s homework worksheet for kid #1 who left his at school after kid #2 already wrote on her copy.
I’m optimistic that because LLMs have brought down the cost of the mere act of typing out code that we will see a shift in focus on certification and verification. Preferably with some legal protection for customers that are sorely lacking today.
Apple’s Preview app (which has a very thorough PDF markup tool) does this right: it has an explicit “redact” tool which deletes the content it’s used on.
[dead]
Always worth remembering that PDFs are basically a graphic design format/editor from the 70s. It was never intended for securely redacting documents and while it can be done, that’s not the default behaviour.
No surprise non-experts muck it up and I don’t see that changing until they move to special-purpose tools.
I think it's part laziness here.
Placing a black rectangle on a PDF is easier than modifying an image or removing text from that same PDF.
I’ve not looked too deeply, but based on other discussion, I wonder if this was malicious noncompliance meant to reveal what the higher-ups were ordering hidden. If victims’ names are properly redacted that would be strong evidence.
The consequences of fucking it up are low, too.
If they get caught, they just take the document down and deny it ever got posted. Claim whatever people can show is a fake.
Since they control the levers of government, there's few with the resources and appetite for holding them accountable. So far, we haven't un-redacted anything too damning, so push hasn't come to shove yet.
The only might change if there's a "blue wave" in the midterms, but even then I wouldn't count on it.
The tool in Acrobat is exactly placing black rectangles on stuff. There's a second step you are supposed to do when you are finishing marking the redactions that edits out the content underneath them, and offers to sanitize other hidden data:
https://www.adobe.com/acrobat/resources/how-to-redact-a-pdf....
That failed redactions happen over and over and over is kind of amazing.
I hope you're not blaming the users. It's understandable they would be confused. The software needs to clarify it for the user. Perhaps, when you try to save it, it should warn you that it looks like you tried to redact text, and that text is still embedded in the document and could be extracted. And then direct you to more information on how to complete the redaction.
8 replies →