It's funny seeing this play out because in my personal life anytime I'm sharing a sensitive document where someone needs to see part of it but I don't want them to see the rest that's not relevant, I'll first block out/redact the text I don't want them to see (covering it, using a redacting highlighter thing, etc.), and then I'll screenshot the page and make that image a PDF.
I always felt paranoid (without any real evidence, just a guess) that there would always be a chance that anything done in software could be reversed somehow.
If it's not done properly, and you happen at any point in the chain to put black blocks on a compressed image (and PDF do compress internal images), you are leaking some bits of information in the shadow casted by the compression algorithm : (Self-plug : https://github.com/unrealwill/jpguncrop )
And that's just in the non-adversarial simple case.
If you don't know the provenance of images you are putting black box on (for example because of a rogue employee intentionally wanting to leak them, or if the image sensor of your target had been compromised to leak some info by another team), your redaction can be rendered ineffective, as some images can be made uncroppable by construction .
I was thinking I understand what's going on but then I came to the image showing the diff and I don't understand at all how that diff can unredact anything.
Normally, I'd never attribute to intention what can be blamed on incompetence. Especially if the government is doing it. But sure, if I were the intern tasked with this job...
There's also metadata in the image files. What specifically would be sensitive in the pdf with screenshots metadata that is also not present in the sceenshot image metadata?
it's absolutely bewildering how ridiculous everything has been so far in terms of competence and this really takes the cherry on the top near Christmas too.
USA is still very high, so they can go much much lower, but I think they might go to some still lower places, finding them where we didn't even know such places could exist. Some ideas:
This low https://en.wikipedia.org/wiki/Child_abuse_in_Pakistan aka a society where child abuse is simply accepted and mainstream, with the child abuse of child labour and dhijhadism being just additional nightmare fuel on top.
Personally, I only trust an image manipulation tool to put down solid colored blocks, or something that does not involve the source pixels when deciding on the redacted pixel. Formats like PDF are just so complicated to trust.
This is what I do while sharing such images. I crop out those parts first and then take another screenshot. I do not even risk painting over and then take another screenshot. I have been doing this forever.
In practical terms, a more convenient way to achieve this is just printing the document to a PDF, which rasterises the visible layer into what the printer would see. Most pdf tools support this.
That seems like a dangerous approach. Though printer drivers do often use rasterization, especially when targeting cheap printers, many printers can render vector graphics and text as well. Print-to-PDF will often use the later approach, unless of course the source program always rasterizes it's output when sending it out to the printer driver, or the used Print-to-PDF driver is particularly stupid.
I then convert the image to grayscale only. Then I apply a filter so that only 16 colors are used. And I then adjust brightness/contrast so that "white is really white". It's all scripted: "screenshot to PDF". One of my oldest shell script.
16 shades of grey (not 50) is plenty enough for text to still be smooth.
I do it for several reasons, one of them being I often take manual notes on official documents (which infuriates my wife btw) but then sometimes I need to then scan the documents and send them (local IRS / notary / bank / whatever). So I'll just scan then I'll fill rectangle with white where I took handnotes. Another reason is when there's paper printed on two sides, at scan times sometimes if the paper is thin / ink is thick, the other side shall show.
I wonder how that'd work vs adversarial inputs: never really thought about it.
Befuddling that this happened again. It’s not the first time
- Paul Manafort court filing (U.S., 2019)
Manafort’s lawyers filed a PDF where the “redacted” parts were basically black highlighting/boxes over live text. Reporters could recover the hidden text (e.g., via copy/paste).
- TSA “Standard Operating Procedures” manual (U.S., 2009)
A publicly posted TSA screening document used black rectangles that did not remove the underlying text; the concealed content could be extracted. This led to extensive discussion and an Inspector General review.
- UK Ministry of Defence submarine security document (UK, 2011)
A MoD report had “redacted” sections that could be revealed by copying/pasting the “blacked out” text—because the text was still present, just visually obscured.
- Apple v. Samsung ruling (U.S., 2011)
A federal judge’s opinion attempted to redact passages, but the content was still recoverable due to the way the PDF was formatted; copying text out revealed the “redacted” parts.
- Associated Press + Facebook valuation estimate in court transcript (U.S., 2009)
The AP reported it could read “redacted” portions of a court transcript by cut-and-paste (classic overlay-style failure). Secondary coverage notes the mechanism explicitly.
A broader “history of failures” compilation (multiple orgs / years)
The PDF Association collected multiple incidents (including several above) and describes the common failure mode: black shapes drawn over text without deleting/sanitizing the underlying content.
https://pdfa.org/wp-content/uploads/2020/06/High-Security-PD...
This has happened so many times I feel like the DoJ must have some sort of standardised redaction pipeline to prevent it by now. Assuming they do, why wasn't it used?
I am happy with their lack of expertise and hope it stays that way, because I cannot remember a single case where redactions put the citizenry at a better place for it.
Of course if it's in the middle of an investigation it can spoil the investigation, allow criminals to cover their tracks, allow escape.
In such case the document should be vetted by competent and honest officials to judge whether it is timely to release it, or whether suppressing it just ensures that investigation is never concluded, extending a forever renewed cover to the criminals.
Secure systems are not exactly the right environment for quick release and handling. So documents invariably get onto regular desktops with off the shelf software used by untrained personnel.
> - Associated Press + Facebook valuation estimate in court transcript (U.S., 2009) The AP reported it could read “redacted” portions of a court transcript by cut-and-paste (classic overlay-style failure). Secondary coverage notes the mechanism explicitly.
What happens in a court case when this occurs? Does the receiving party get to review and use the redacted information (assuming it’s not gagged by other means) or do they have to immediately report the error and clean room it?
Edit: after reading up on this it looks like attorneys have strict ethical standards to not use the information (for what little that may be worth), but the Associated Press was a third party who unredacted public court documents in a separate Facebook case.
> What happens in a court case when this occurs? Does the receiving party get to review and use the redacted information (assuming it’s not gagged by other means) or do they have to immediately report the error and clean room it?
Typically, two copies of a redacted document are submitted via ECF. One is an unredacted but sealed copy that is visible to the judge and all parties to the case. The other is a redacted copy that is visible to the general public.
So, to answer what I believe to be your question: the opposing party in a case would typically have an unredacted copy regardless of whether information is leaked to the general public via improper redaction, so the issue you raise is moot.
My guess would be that if the benefitting legal party didn't need to declare they also benefitted from this (because they legally can't be caught, etc.) they wouldn't.
I know and am friends with a lot of lawyers. They're pretty ruthless when it comes to this kind of thing.
Legally, I would think both parties get copies of everything. I don't know if that was the case here.
> Edit: after reading up on this it looks like attorneys have strict ethical standards to not use the information (for what little that may be worth), but the Associated Press was a third party who unredacted public court documents in a separate Facebook case.
Curious. I am not a litigator but this is surprising if you found support for it. My gut was that the general obligation to be a zealous advocate for your client would require a litigant to use inadvertently disclosed information unless it was somehow barred by the court. Confidentiality obligations would remain owed to the client, and there might be some tension there but it would be resolvable.
Given the context and the baldly political direction behind the redactions, it's not at all unlikely that this is the result of deliberate sabotage or malicious compliance. Bondi isn't blacking these things out herself, she's ordering people to do it who aren't true believers. Purges take time (and often blood). She's stuck with the staff trained under previous administrations.
"There are major differences between the Trump 1.0 and 2.0 administrations. In the Trump 1.0 administration, many of the most important officials were very competent men. One example would be then-Attorney General William Barr. Barr is contemptible, yes, but smart AF. When Barr’s DOJ released a redacted version of the Mueller Report, they printed the whole thing, made their redactions with actual ink, and then re-scanned every page to generate a new PDF with absolutely no digital trace of the original PDF file. There are ways to properly redact a PDF digitally, but going analog is foolproof.
The Trump 2.0 administration, in contrast, is staffed top to bottom with fools."
It's like Russian spies being caught in the Netherlands with taxi receipts showing they took a taxi from their Moscow HQ to the airport: corrupt organizations attract/can only hire incompetent people...
Anyone remember how the Trump I regime had staff who couldn't figure out the lighting in the White House, or mistitled Australia's Prime Minister as President?
The bigger difference from my perspective is that they have competent people doing the strategy this time. The last Trump administration failed to use the obvious levers available to accomplish fascism, while this one has been wildly successful on that end. In a few years they will have realigned the whole power dynamic in the country, and unfortunately more and more competent people will choose to work for them in order to receive the benefits of doing so.
It’s easy to appear competent when you’re sitting on your butt doing nothing. Had exactly did Barr and Co. accomplish in terms of moving forward the agenda people voted for? These guys were so eager to win accolades from liberals they couldn’t even pick the lowest hanging fruit. Totally pathetic effort after the stellar performance by the legal eagles in the Obama administration. Trump 2.0 is pursing a very aggressive legal strategy. It has a bunch of very smart people racking up wins in areas such as funding cuts, education, civil rights, deployment of national guard, etc. It also has people that are… struggling. But, unlike with Trump 1.0, they’re actually trying to move the ball forward for their team.
> but smart AF. When Barr’s DOJ released a redacted version of the Mueller Report, they printed the whole thing, made their redactions with actual ink, and then re-scanned every page to generate a new PDF with absolutely no digital trace of the original PDF file.
This is a dumb way of doing that, exactly what "stupid" people do when their are somewhat aware of the limits of their competence or only as smart as the tech they grew up with. Also, this type of redaction eliminates the possibility to change text length, which is a very common leak when especially for various names/official positions. And it doesn't eliminate the risk of non-redaction since you can't simply search&replace with machine precision, but have to do the manual conversion step to printed position
The covid origins Slack messages discovery material (Anderson & Holmes) were famously poorly redacted pdfs, allowing their unredacting by Gilles Demaneuf, benefiting all of us.
They're not 'hacks' it's the people doing the redaction making beginner mistakes of not properly removing the selectable text under the redactions. They're either drawing black rectangles over the text or highlighting it black neither of which prevents the underlying text from being selected.
Keeping that secret would require sponaneous silence from everyone looking at these docs which is just not possible.
This was my initial reaction to this news. I mean think about it
The Trump team knows that nobody is gonna buy whatever they put out as being the full story. Isn't this just the perfect way to make people feel like they got something they weren't supposed to see? They can increase trust in the output without having to increase trust in the source of it
And as far as I've heard there hasn't been anything "unredacted" that's been of any consequence. It all just feels a little too perfect.
Black square vs redaction tool difference is well known if someone's job involves redacting PDF or just working with PDF. It's most likely that additional staffs were pulled in and weren't given enough training.
Colleagues whose full time job is doing this sort of thing for various bits of the government have told me this is exactly the case here. People from all over the government have been deputized to redact these documents with little or no prior training.
My understanding is that many people were fired and replaced by loyalists at the FBI. I think there are a lot of incompetent people working there right now.
Let people believe it's deliberate sabotage. Unfortunately, in real life, minions of a dictator serve the dictator; they don't risk their live or safety for a noble cause. Any screw-ups are a result of gross incompetence that is typical for every dictatorship.
Given the sheer number of people they had to pull in and work overtime to redact Trump's name as well as those of prominent Republicans and donors as per numerous sources within the FBI and the administration itself, incompetence is likely for a chunk of it.
It’s funny that this effort, the largest exertion of FBI agents second only to 9/11, seems to be unprepared to redact. Cynically, I’m prepared for it to be part of a generative set of PDFs derived from the prompt “create court documents consistent with these 16 PDFs which obscure the role of Donald Trump between 1993 and 1998.”
Any major documents/files have been removed all together. Then the rest was farmed out to anyone they could find with basic instructions to redact anything embarrassing.
Since there's absolutely zero chance anyone in the administration will ever be held accountable for what's left, they're not overly concerned.
The thing that I've been waiting to see for years is the actual video recordings. There were supposedly cameras everywhere, for years. I'm not even talking about the disgusting stuff, I'm talking security for entrances, hallways, etc.
The FBI definitely has them, where are they?
What about Maxwell's media files? There was nothing found there? Did they subpoena security companies and cloud providers?
The documents are all deniable. Yes video evidence can now be easily faked, but real video will have details that are hard to invent. Regardless, videos are worth millions of words.
Reporting is that they had a basically impossible deadline and they took lawyers off of counterintelligence work to do this. So a conscious act of resistance is possible, but it's a situation where mistakes are likely - people working very quickly trying to meet a deadline and doing work they aren't that familiar with and don't really want to be doing.
It seems like a common tactic by this administration is to just not do what they are required to do until they have been told 50 times and criminal charges are being filed. I suspect the actual truth here is 'don't do this' turned into 'you have 1 day to do this and keep my name out of the release' which led to lots of issues. They probably spent more time deciding the order of pages to release, and how to avoid releasing the things damaging to the administration, than actually doing the work needed to release it. Now they will say 'look, see! You didn't give us enough time and our incompetence is the proof'
For context, lawyers deal with this all the time. In discovery, there is an extensive document ("doc") review process to determine if documents are responsive or non-responsive. For example, let's say I subpoenaed all communication between Bob and Alice between 1 Jan 2019 and 1 Jan 2020 in relation to the purchase of ABC Inc as part of litigation. Every email would be reviewed and if it's relevant to the subpoena, it's marked as responsive, given an identifier and handed over to the other side. Non-responsive communication might not be eg attorney-client communications.
It can go further and parts of documents can be viewed as non-responsive and otherwise be blacked out eg the minutes of a meeting that discussed 4 topics and only 1 of them was about the company purchase. That may be commercially sensitive and beyond the scope of the subpoena.
Every such redaction and exclusion has to be logged and a reason given for it being non-responsive where a judge can review that and decide if the reason is good or not, should it ever be an issue. Can lawyers find something damaging and not want to hand it over and just mark it non-responsive? Technically, yes. Kind of. It's a good way to get disbarred or even jailed.
My point with this is that lawyers, which the Department of Justice is full of, are no strangers to this process so should be able to do it adequately. If they reveal something damaging to their client this way, they themselves can get sued for whatever the damages are. So it's something they're careful about, for good reason.
So in my opinion, it's unlikely that this is an act of resistance. Lawyers won't generally commit overt illegal acts, particularly when the only incentive is keeping their job and the downside is losing their career. It could happen.
What I suspect is happening is all the good lawyers simply aren't engaging in this redaction process because they know better so the DoJ had the wheel out some bad and/or unethical ones who would.
What they're doing is in blatant violation to the law passed last month and good lawyers know it.
There's a lot of this going on at the DoJ currently. Take the recent political prosecutions of James Comey, Letitia James, etc. No good prosecutor is putting their name to those indictments so the administration was forced to bring in incompetent stooges who would. This included former Trump personal attorneys who got improerly appointed as US Attorneys. This got the Comey indictment thrown out.
The law that Ro Khanna and Thomas Massey co-sponsored was sweeping and clear about what needs to be released. The DoJ is trying to protect both members of the administration and powerful people, some of whom are likely big donors and/or foreign government officials or even heads of state.
That's also why this process is so slow I imagine. There are only so many ethically compromised lackeys they can find.
Fine, but the teeth of this act belong to some future justice department. I predict Trump will issue blanket pardons for everyone involved, up to Bondi; and that none of them will respect a congressional subpoena.
> My point with this is that lawyers, which the Department of Justice is full of, are no strangers to this process so should be able to do it adequately. If they reveal something damaging to their client this way, they themselves can get sued for whatever the damages are. So it's something they're careful about, for good reason.
> So in my opinion, it's unlikely that this is an act of resistance. Lawyers won't generally commit overt illegal acts,
Political redaction in this release under the Epstein Transparency Act is an overt, illegal act.
Does that reconfigure your estimation of whether DoJ attorneys that aren't the Trump inner-circle loyalists installed in leadership roles might engage in resistance against (or at least fail to point out methodological flaws in the inplmentation of) it?
Its not a hack to copy and paste text that is part of the document data. The incompetence of the people responsible to comply with the law doesnt mean its reasonable to label something a hack.
You guessing my password is not the same as a know and expected behavior of a program. Adobe has a specific feature to redact. PDF is a format known to have layers. Lawyers are trained on day one not to make this mistake. (I am a recovering lawyer). This is either incompetence or deliberate disclosure.
If someone sends me a document with text in it that they meant to remove but didn't and then I read that text, I haven't hacked anything they're just incompetent.
Hacking is unauthorised use of a system. Reading a document that was not adequately redacted can hardly be considered hacking.
I’m not an attorney or anything, but the relevant federal statute is explicitly about unauthorized access of computer systems (18 USC 1030).
Opening someone else’s laptop and guessing the password would absolutely fall under that definition, but I think it’s very much questionable if poking around a document that you have legitimately obtained would do so.
But copying and pasting text of publicly released documents is not illegal. Accessing someone’s computer is illegal.
While maybe it could fall under the umbrella of hacking in some general way, articles, and especially titles, should be more precise.
Hacking is any use of a technology in a way that it wasn’t intended. The redaction is so stupid as to almost appear intentional, so maybe you’re right, this isn’t hacking because maybe the information was intended to be discovered.
Yes, this is the digital equivalent of sticking a blank Post-it over text and calling it “redacted”. Mind-boggling that the same mistake has been made over and over again.
Also had this first thought, but then a hack could just be a way around a limit/lack of authorization, doesn't have to be unknown/sophisticated, so copy of black boxes fits
By serving up the PDF file I am being authorized to receive, view, process, etc etc the entire contents. Not just some limited subset. If I wasn't authorized to receive some portion of the file then that needed to be withheld to begin with.
That's entirely different from gaining unauthorized entry to a system and copying out files that were never publicly available to begin with.
To put it simply, I am not responsible for the other party's incompetence.
But this isn’t an unexpected technique it’s literally the core design of the pdf format. It’s a layered format that preserves the layers on any machine. Adobe has a redaction feature to overcome the default behavior that each layer can be accessed even if there is a top layer in front.
The average office worker has it on their computer, illustrating how commonplace unredacting could be. Any text tool will work, even some designed to detect bad redactions in PDFs via drag and drop (now specifically trained on these known bad redactions). https://github.com/freelawproject/x-ray
Not the first time; in 2005 the US report about Nicola Calipari's death in Baghdad was redacted (and unredacted by italian newspapers) in the same way.
Stupid question: why is the government even allowed to redact stuff? Isn’t the government keeping secrets from the people totally antithetical to democracy?
It's not the government, it's the department of justice. To name two: protection of witnesses, protection of state secrets ("the people" is not a person who can keep secrets).
Right, I’m aware of the excuses the government uses to keep secrets.
But on principle, what right does the government have to keep secrets from its own people? I don’t believe we had that button at the founding, it was added somewhere along the way. I’m asking what is the justification for this, and whether in the grand scheme of things that outweighs the principle of the government not being a separate entity from the people.
There are multiple ways to approach witness protection. For example if we have a problem with witnesses being harmed we could make being involved with witness harm at any layer of indirection a capital offense. We can probably think of other options besides the government being allowed to keep secrets from its own people.
Competence and possibility of malicious compliance are interesting questions, but I think the more appropriate question is if DoJ will be sued for violating the law by redacting unrelated content?
Apart from the technological and procedural question, I would love to learn why the DOJ found it important to protect Indyke. He was Epstein's lawyer, and now we learn that he was personally involved. He is not a Washington person. We expected there to be politically motivated protection of certain people, but is the DOJ just going to blanket protect anybody in the docs?
Indyke works for other powerful people, runs in MAGA circles.
Two things come to mind:
* Some things Indyke did fall outside the scope of lawyer-client privilege. It would be bad for certain people to get him on a stand and force him to spill the beans. He was never interviewed re: Epstein [1]
* He's a very talented lawyer, insofar as a competent lawyer with, at least, extreme discretion, is talented.
He was Epstein’s lawyer, he almost certainly has the dirt on anyone the DoJ wants to protect, and may be the kind of person that would be inclined to burn whoever DoJ was protecting if he wasn't getting treatment at least as favorable.
> [Indyke] was hired by the Parlatore Law Group in 2022, before the justice department settled the Epstein case. That firm represents the defense secretary, Pete Hegseth, and previously represented Donald Trump in his defense against charges stemming from the discovery of classified government documents stored at Trump’s Florida estate.
So I don't know about "not a Washington person", but clearly connections exist to the current administration.
I think even after printing and scanning there could still be jpg artfacts from the original (e.g. if you scan lossless).
However, I wonder whether heavily compressing the redacted image would help remove any unwanted artefacts. But the best solution is probably to render the original file from scratch, without compression, before redacting the image.
This is probably just pure stupidity, but part of me hopes there is some tech person in there who knew exactly what they were doing. I’d take a job as a tech person in this administration just to sabotage stuff like this.
PDFs do have a "burn and destroy the parts/layers below" as part of the spec meant explicitly for redaction like this. Apparently they didn't use it, I guess?
The non-complex mafia businesses is moot since the 50ies already. They run Vegas, most of big sports leagues, politics, secret services and restaurant chains. Everything which can effectively wash money.
There is a book by Richard Dawkins- I am me I am free or something like that, and it has a main picture of Richard standing naked and having a private part being covered by black rectangle but somehow my laptop back then was slow and when you scrolled it would temporary remove the square for a split second
Are you sure? I can't find any trace of any book by Richard Dawkins with a title much like that, and that doesn't seem like a very on-brand sort of cover pic for a book by him, and an image search for "Richard Dawkins book cover" doesn't turn up anything like it.
PDF is an absurdly complex file format. It's part of the reason there is no single "good" PDF reader, just a lot of mediocre PDF readers that are all terrible in their own way. Which is a topic for another day.
There are several ways to remove data in a PDF:
- Remove the data. This is much harder than it sounds. Many PDF tools won't let you change the content of a PDF, not because it isn't possible, but because you'll likely massively screw up the formatting, and the tools don't want to deal with that.
- Replace the data. This what what all the "blackout" tools do, find "A" and replace with "🮋". This is effective and doesn't break formatting since it's a 1-to-1 replacement. The problem with "replacing" is that not every PDF tool works the same way, and some, instead, just change the foreground and background color to black; it looks nearly the same, but the power of copy-and-paste still functions.
- Then you have the computer illiterate, who think changing the foreground and background color to black is good enough anyway.
> - Remove the data. This is much harder than it sounds. Many PDF tools won't let you change the content of a PDF, not because it isn't possible, but because you'll likely massively screw up the formatting, and the tools don't want to deal with that.
Compared to other formats this is actually relatively easy in a PDF since the way the text drawing operators work they don't influence the state for arbitrary other content. A lot of positioning in a PDF is absolute (or relative to an explicitly defined matrix which has hardcoded values). Usually this makes editing a PDF harder (since when changing text the related text does not adapt automatically), but when removing data it makes it much easier since you can mostly just delete it without affecting anything else. (There are exceptions for text immediately after the removed data, but that's limited and relatively easy to control.)
> - Replace the data. This what what all the "blackout" tools do, find "A" and replace with "🮋". This is effective and doesn't break formatting since it's a 1-to-1 replacement.
That's actually rather tricky in PDFs since they usually contain embedded subset fonts and these usually do not have "🮋" as part of the subset. Also doing this would break the layout since "🮋" has a different width than most letters in a typical font, so it would not lead to less formatting issues than the previous option. Unless the "🮋" is stretched for each letter to have the same dimensions, but then the stretched characters allow to recover the text.
> The problem with "replacing" is that not every PDF tool works the same way, and some, instead, just change the foreground and background color to black; it looks nearly the same, but the power of copy-and-paste still functions.
PDF does not have a concept of a background color. If it looks like a background color in PDF, you have a rectangle drawn in one color and something in the foreground color in front of it. What you usually see in badly redacted PDF files is exactly this, but in opposite color: Someone just draws a black box on top of the characters. You could argue that this is smarter since it would still work even if someone would chnage colors, but of course, PDF is a vector format. If you just add a rectangle, someone else can remove it again. (And also copy & paste doesn't care about your rectangle)
>- Remove the data. This is much harder than it sounds. Many PDF tools won't let you change the content of a PDF, not because it isn't possible, but because you'll likely massively screw up the formatting, and the tools don't want to deal with that.
>- Replace the data. This what what all the "blackout" tools do, find "A" and replace with "🮋". This is effective and doesn't break formatting since it's a 1-to-1 replacement. The problem with "replacing" is that not every PDF tool works the same way, and some, instead, just change the foreground and background color to black; it looks nearly the same, but the power of copy-and-paste still functions.
You're making it sound way harder than it is, when both adobe acrobat and the built-in preview app on mac can both competently redact documents. I'm not aware of instances of either (or any other purpose-made redaction tools) failing. I wouldn't homebrew a python script to do my redaction either, but that doesn't mean doing redactions properly in some insurmountable task for some intern.
Thanks for this. Really quells the urge I get every so often to just code my own PDF editor, because they all suck and certainly it couldn't be THAT hard. Such hubris!
I remember reading the recommendation for journalists to redact documents is to black them out in the digital version, print it out, and re-scan it. Anything else has too many potential ways by which it might be possible to smuggle data.
Even that might leak to length attacks: one reasonable plaintext would lead to black bars of 1135 px, another to 1138 px, and with enough redactions you can converge on what the plaintext might be.
The only safe way for journalists is to paraphrase what the document said and to say "an unnamed source claims that ..." and to guarantee with your reputation, and the reputation of your publisher, that you are being faithful to what the original source said. For even better results, combine multiple sources.
Unfortunately paraphrasing things and taking editorial responsibility have both been deprecated in favour of rereleasing press releases in the house style, so it's difficult to get the actual journalism these days.
Mistaking redaction tool (replaces data with black square) and black highlighter (adds black square as another layer). If people doing redactions are computer-illiterate, they won't see the difference.
They drew black boxes over the text. The text is still underneath. On OCR'd scanned documents, the text you'd copy is actually stored in metadata and just linked by position to the image.
Anyway, if you click on a "redaction", you're clicking on the box and can't select the text underneath, but if you just highlight the text around it, you can copy all the original text.
PDF is less like an image, and more like a web page where elements can be stacked on top of each other. You can visually obscure things by sticking a black rectangle over the top, but anyone who inspects inside the pdf can remove it or see the text in the source.
There would also be a mix of text documents, and image scans. The way to censor each is different.
Perfectly censoring documents, particularly digital ones is actually surprisingly difficult.
Probably the Underhanded C Contest (https://www.underhanded-c.org/_page_id_17.html) but yeah. Obfuscated C Contest entries usually aren't underhanded, just intentionally obscure about what they do or how they do it.
ah, found it - this is from the 'Court Records' part.
https://www.justice.gov/multimedia/Court Records/Matter of the Estate of Jeffrey E. Epstein, Deceased, No. ST-21-RV-00005 (V.I. Super. Ct. 2021)/2022.03.17-1 Exhibit 1.pdf
Copying and pasting doesn't work. Unless your PDF viewer does OCR. And if the redaction is just a black rectangle overlaid on top, that can still be removed.
I love how the entire internet thinks that this is a big deal when all that happened is that USDOJ re-posted some poorly-redacted court documents that were poorly redacted by non-USDOJ attorneys more than three years ago.
Yes, USDOJ is incompetent and dysfunctional, but this is not why. But sure, whatever, carry on...
They are unredacted because either those in charge are not familiar with basic office tasks, or someone wanted this stuff to leak and nobody checked thier work. Either brand of incompetance should cause heads to roll. But, just like the signal fiasco, nothing will happen. When your brand is perfection, you cannot ever admit a mistake.
So is the data extracted the names of the victims that were supposed to be hidden to protect them? Or is there something else that might be worthy of exposing?
The downvoters assume that it is a bad faith question. The downvoters are 99% right with that. If the 1% hit then OP is just exceedingly naive and did not followed the scandal in which case they should maybe first do some reading.
The names of involved powerful people were NOT supposed to be censored. All those names except Bill Clinton name were redacted. To protect Trump and everybody else involved in the scandal except said Bill Clinton. But especially to protect Trump.
It's certainly possible that some of the underlings are deliberately sabotaging orders from above. It's also possible that they're incompetent, as so many of the Trump team are. How would we know which it is?
Did we learn anything useful or is it exactly as I said in the other thread, which got downvoted to hell, that all the really juicy blackmail material is with the CIA and will never see the light of day?
Won't know until all the documents are released. The blackmail is undeniable. But what's more interesting is who else was involved. Who purchased his services? That's what they are trying to hide.
Regardless of the content itself, naive redaction of a high profile PDF still exposing the text contents is something that seems relevant to the community. Maybe you are in the wrong place?
“Like you guys have had this stuff for a year. Doesn’t it seem like you could just throw all that into AI at this stage of the game? And just redact the names of the victims, and let’s go.” Joe Rogan
it's even less impressive; somebody left the credentials typed into the text boxes and went to get a slimfast out of the staff breakroom and you walked into the computer lab and hit enter.
I think this is a good thing. I think the people talking dictator this and that do not understand we have the ability to critique the administration. What we lack is control of the underhanded lobbyism. It is a warped democracy but still a democracy.
It's funny seeing this play out because in my personal life anytime I'm sharing a sensitive document where someone needs to see part of it but I don't want them to see the rest that's not relevant, I'll first block out/redact the text I don't want them to see (covering it, using a redacting highlighter thing, etc.), and then I'll screenshot the page and make that image a PDF.
I always felt paranoid (without any real evidence, just a guess) that there would always be a chance that anything done in software could be reversed somehow.
If it's not done properly, and you happen at any point in the chain to put black blocks on a compressed image (and PDF do compress internal images), you are leaking some bits of information in the shadow casted by the compression algorithm : (Self-plug : https://github.com/unrealwill/jpguncrop )
And that's just in the non-adversarial simple case.
If you don't know the provenance of images you are putting black box on (for example because of a rogue employee intentionally wanting to leak them, or if the image sensor of your target had been compromised to leak some info by another team), your redaction can be rendered ineffective, as some images can be made uncroppable by construction .
(Self-plug : https://github.com/unrealwill/uncroppable )
And also be aware that compression is hiding everywhere : https://en.wikipedia.org/wiki/Compressed_sensing
3 replies →
I was thinking I understand what's going on but then I came to the image showing the diff and I don't understand at all how that diff can unredact anything.
2 replies →
Maybe the person tasked with the redacting didn't agree so they chose the worst possible way to do it.
Normally, I'd never attribute to intention what can be blamed on incompetence. Especially if the government is doing it. But sure, if I were the intern tasked with this job...
I'll just send an image and not bother with a PDF.
(Note there's also other metadata in a PDF, which you may not want your recipient to know either.)
There's also metadata in the image files. What specifically would be sensitive in the pdf with screenshots metadata that is also not present in the sceenshot image metadata?
it's absolutely bewildering how ridiculous everything has been so far in terms of competence and this really takes the cherry on the top near Christmas too.
how much lower can they go ?!
USA is still very high, so they can go much much lower, but I think they might go to some still lower places, finding them where we didn't even know such places could exist. Some ideas:
- Leave NATO
- Start openly supporting Russia and North Korea
- Arrest whole International Criminal Court
- Preventively invade China
15 replies →
This low https://en.wikipedia.org/wiki/Child_abuse_in_Pakistan aka a society where child abuse is simply accepted and mainstream, with the child abuse of child labour and dhijhadism being just additional nightmare fuel on top.
I'm not too concerned about the US. They've made their bed.
I'm more concerned with them dragging everyone else down, and someone much worse taking their place.
Maybe it was always part of the plan. Plausible Deniability.
1 reply →
The really interesting bit is whether they can go another term.
1 reply →
Personally, I only trust an image manipulation tool to put down solid colored blocks, or something that does not involve the source pixels when deciding on the redacted pixel. Formats like PDF are just so complicated to trust.
The one that was crazy to me is undoing a blur effect (based on its algo), so yeah I also will layer and screenshot something
And even being this careful, if the opacity is slightly off it could be undone
This is what I do while sharing such images. I crop out those parts first and then take another screenshot. I do not even risk painting over and then take another screenshot. I have been doing this forever.
In practical terms, a more convenient way to achieve this is just printing the document to a PDF, which rasterises the visible layer into what the printer would see. Most pdf tools support this.
That seems like a dangerous approach. Though printer drivers do often use rasterization, especially when targeting cheap printers, many printers can render vector graphics and text as well. Print-to-PDF will often use the later approach, unless of course the source program always rasterizes it's output when sending it out to the printer driver, or the used Print-to-PDF driver is particularly stupid.
I feel the same and do the same.
Me too.
I then convert the image to grayscale only. Then I apply a filter so that only 16 colors are used. And I then adjust brightness/contrast so that "white is really white". It's all scripted: "screenshot to PDF". One of my oldest shell script.
16 shades of grey (not 50) is plenty enough for text to still be smooth.
I do it for several reasons, one of them being I often take manual notes on official documents (which infuriates my wife btw) but then sometimes I need to then scan the documents and send them (local IRS / notary / bank / whatever). So I'll just scan then I'll fill rectangle with white where I took handnotes. Another reason is when there's paper printed on two sides, at scan times sometimes if the paper is thin / ink is thick, the other side shall show.
I wonder how that'd work vs adversarial inputs: never really thought about it.
care to share the script?
Befuddling that this happened again. It’s not the first time
- Paul Manafort court filing (U.S., 2019) Manafort’s lawyers filed a PDF where the “redacted” parts were basically black highlighting/boxes over live text. Reporters could recover the hidden text (e.g., via copy/paste).
- TSA “Standard Operating Procedures” manual (U.S., 2009) A publicly posted TSA screening document used black rectangles that did not remove the underlying text; the concealed content could be extracted. This led to extensive discussion and an Inspector General review.
- UK Ministry of Defence submarine security document (UK, 2011) A MoD report had “redacted” sections that could be revealed by copying/pasting the “blacked out” text—because the text was still present, just visually obscured.
- Apple v. Samsung ruling (U.S., 2011) A federal judge’s opinion attempted to redact passages, but the content was still recoverable due to the way the PDF was formatted; copying text out revealed the “redacted” parts.
- Associated Press + Facebook valuation estimate in court transcript (U.S., 2009) The AP reported it could read “redacted” portions of a court transcript by cut-and-paste (classic overlay-style failure). Secondary coverage notes the mechanism explicitly.
A broader “history of failures” compilation (multiple orgs / years) The PDF Association collected multiple incidents (including several above) and describes the common failure mode: black shapes drawn over text without deleting/sanitizing the underlying content. https://pdfa.org/wp-content/uploads/2020/06/High-Security-PD...
This has happened so many times I feel like the DoJ must have some sort of standardised redaction pipeline to prevent it by now. Assuming they do, why wasn't it used?
I am happy with their lack of expertise and hope it stays that way, because I cannot remember a single case where redactions put the citizenry at a better place for it.
Of course if it's in the middle of an investigation it can spoil the investigation, allow criminals to cover their tracks, allow escape.
In such case the document should be vetted by competent and honest officials to judge whether it is timely to release it, or whether suppressing it just ensures that investigation is never concluded, extending a forever renewed cover to the criminals.
Secure systems are not exactly the right environment for quick release and handling. So documents invariably get onto regular desktops with off the shelf software used by untrained personnel.
I want to believe this is malicious compliance.
Since hundreds of people were involved the most likely explanation is incompetence
2 replies →
> - Associated Press + Facebook valuation estimate in court transcript (U.S., 2009) The AP reported it could read “redacted” portions of a court transcript by cut-and-paste (classic overlay-style failure). Secondary coverage notes the mechanism explicitly.
What happens in a court case when this occurs? Does the receiving party get to review and use the redacted information (assuming it’s not gagged by other means) or do they have to immediately report the error and clean room it?
Edit: after reading up on this it looks like attorneys have strict ethical standards to not use the information (for what little that may be worth), but the Associated Press was a third party who unredacted public court documents in a separate Facebook case.
> What happens in a court case when this occurs? Does the receiving party get to review and use the redacted information (assuming it’s not gagged by other means) or do they have to immediately report the error and clean room it?
Typically, two copies of a redacted document are submitted via ECF. One is an unredacted but sealed copy that is visible to the judge and all parties to the case. The other is a redacted copy that is visible to the general public.
So, to answer what I believe to be your question: the opposing party in a case would typically have an unredacted copy regardless of whether information is leaked to the general public via improper redaction, so the issue you raise is moot.
My guess would be that if the benefitting legal party didn't need to declare they also benefitted from this (because they legally can't be caught, etc.) they wouldn't.
I know and am friends with a lot of lawyers. They're pretty ruthless when it comes to this kind of thing.
Legally, I would think both parties get copies of everything. I don't know if that was the case here.
> Edit: after reading up on this it looks like attorneys have strict ethical standards to not use the information (for what little that may be worth), but the Associated Press was a third party who unredacted public court documents in a separate Facebook case.
Curious. I am not a litigator but this is surprising if you found support for it. My gut was that the general obligation to be a zealous advocate for your client would require a litigant to use inadvertently disclosed information unless it was somehow barred by the court. Confidentiality obligations would remain owed to the client, and there might be some tension there but it would be resolvable.
2 replies →
Follow the letter of the law, but not the spirit.
It already seems that they blacked out more than the law allowed, so following neither.
Not that it matters much what the law says if the goal is to protect the man who hands out pardons...
Given the context and the baldly political direction behind the redactions, it's not at all unlikely that this is the result of deliberate sabotage or malicious compliance. Bondi isn't blacking these things out herself, she's ordering people to do it who aren't true believers. Purges take time (and often blood). She's stuck with the staff trained under previous administrations.
Or it is just the result of firing people who were competent and giving insufficient training to people who had never done this before.
"There are major differences between the Trump 1.0 and 2.0 administrations. In the Trump 1.0 administration, many of the most important officials were very competent men. One example would be then-Attorney General William Barr. Barr is contemptible, yes, but smart AF. When Barr’s DOJ released a redacted version of the Mueller Report, they printed the whole thing, made their redactions with actual ink, and then re-scanned every page to generate a new PDF with absolutely no digital trace of the original PDF file. There are ways to properly redact a PDF digitally, but going analog is foolproof.
The Trump 2.0 administration, in contrast, is staffed top to bottom with fools."
https://daringfireball.net/linked/2025/12/23/trump-doj-pdf-r...
> made their redactions with actual ink, and then re-scanned every page
That's not very competent.
> going analog is foolproof
Absolutely not. There are many way's to f this up. Just the smallest variation in places that have been inked twice will reveal the clear text.
5 replies →
It's like Russian spies being caught in the Netherlands with taxi receipts showing they took a taxi from their Moscow HQ to the airport: corrupt organizations attract/can only hire incompetent people...
https://www.vice.com/en/article/russian-spies-chemical-weapo...
Anyone remember how the Trump I regime had staff who couldn't figure out the lighting in the White House, or mistitled Australia's Prime Minister as President?
5 replies →
I would just do the digital version of that: add 100% black bars then screenshot page by page and probably increase the contrast too.
The bigger difference from my perspective is that they have competent people doing the strategy this time. The last Trump administration failed to use the obvious levers available to accomplish fascism, while this one has been wildly successful on that end. In a few years they will have realigned the whole power dynamic in the country, and unfortunately more and more competent people will choose to work for them in order to receive the benefits of doing so.
4 replies →
> William Barr. Barr is contemptible, yes, but smart AF
You mean the guy who covered up for Epstein's 'suicide' and expected us morons to believe it?
2 replies →
It’s easy to appear competent when you’re sitting on your butt doing nothing. Had exactly did Barr and Co. accomplish in terms of moving forward the agenda people voted for? These guys were so eager to win accolades from liberals they couldn’t even pick the lowest hanging fruit. Totally pathetic effort after the stellar performance by the legal eagles in the Obama administration. Trump 2.0 is pursing a very aggressive legal strategy. It has a bunch of very smart people racking up wins in areas such as funding cuts, education, civil rights, deployment of national guard, etc. It also has people that are… struggling. But, unlike with Trump 1.0, they’re actually trying to move the ball forward for their team.
1 reply →
> but smart AF. When Barr’s DOJ released a redacted version of the Mueller Report, they printed the whole thing, made their redactions with actual ink, and then re-scanned every page to generate a new PDF with absolutely no digital trace of the original PDF file.
This is a dumb way of doing that, exactly what "stupid" people do when their are somewhat aware of the limits of their competence or only as smart as the tech they grew up with. Also, this type of redaction eliminates the possibility to change text length, which is a very common leak when especially for various names/official positions. And it doesn't eliminate the risk of non-redaction since you can't simply search&replace with machine precision, but have to do the manual conversion step to printed position
33 replies →
The covid origins Slack messages discovery material (Anderson & Holmes) were famously poorly redacted pdfs, allowing their unredacting by Gilles Demaneuf, benefiting all of us.
[flagged]
You mean the layers that were, in fact, just side effects of scanning the (non-authoritative) short form certificate?
"Never interrupt your enemy when he is making a mistake" - Napoleon Bonaparte
Let all the files get released first.
Then show your hacks.
They're not 'hacks' it's the people doing the redaction making beginner mistakes of not properly removing the selectable text under the redactions. They're either drawing black rectangles over the text or highlighting it black neither of which prevents the underlying text from being selected.
Keeping that secret would require sponaneous silence from everyone looking at these docs which is just not possible.
Yes but don't tell them they're doing it wrong.
1 reply →
Also don’t assume the mistake wasn’t intentional.
This was my initial reaction to this news. I mean think about it
The Trump team knows that nobody is gonna buy whatever they put out as being the full story. Isn't this just the perfect way to make people feel like they got something they weren't supposed to see? They can increase trust in the output without having to increase trust in the source of it
And as far as I've heard there hasn't been anything "unredacted" that's been of any consequence. It all just feels a little too perfect.
6 replies →
"Never ascribe to malice that which is adequately explained by incompetence."
1 reply →
Too late. The data has been touched far too many times. The chain of custody and any accountability will never happen.
I wonder if any of this is a conscious act of resistance vs. just incompetence.
And yes, I've heard of Hanlon's Razor haha
https://en.wikipedia.org/wiki/Hanlon%27s_razor
Black square vs redaction tool difference is well known if someone's job involves redacting PDF or just working with PDF. It's most likely that additional staffs were pulled in and weren't given enough training.
Colleagues whose full time job is doing this sort of thing for various bits of the government have told me this is exactly the case here. People from all over the government have been deputized to redact these documents with little or no prior training.
9 replies →
My understanding is that many people were fired and replaced by loyalists at the FBI. I think there are a lot of incompetent people working there right now.
Yeah — don't attribute to resistance what can adequately be explained by idiocy.
Let people believe it's deliberate sabotage. Unfortunately, in real life, minions of a dictator serve the dictator; they don't risk their live or safety for a noble cause. Any screw-ups are a result of gross incompetence that is typical for every dictatorship.
57 replies →
A third possibility is diversion, while the most damaging evidence would be suppressed a different way.
Another option: also change some of the text underneath.
Given the sheer number of people they had to pull in and work overtime to redact Trump's name as well as those of prominent Republicans and donors as per numerous sources within the FBI and the administration itself, incompetence is likely for a chunk of it.
It’s funny that this effort, the largest exertion of FBI agents second only to 9/11, seems to be unprepared to redact. Cynically, I’m prepared for it to be part of a generative set of PDFs derived from the prompt “create court documents consistent with these 16 PDFs which obscure the role of Donald Trump between 1993 and 1998.”
1 reply →
There's a third option: Ambivalence.
Any major documents/files have been removed all together. Then the rest was farmed out to anyone they could find with basic instructions to redact anything embarrassing.
Since there's absolutely zero chance anyone in the administration will ever be held accountable for what's left, they're not overly concerned.
The thing that I've been waiting to see for years is the actual video recordings. There were supposedly cameras everywhere, for years. I'm not even talking about the disgusting stuff, I'm talking security for entrances, hallways, etc.
The FBI definitely has them, where are they?
What about Maxwell's media files? There was nothing found there? Did they subpoena security companies and cloud providers?
The documents are all deniable. Yes video evidence can now be easily faked, but real video will have details that are hard to invent. Regardless, videos are worth millions of words.
Reporting is that they had a basically impossible deadline and they took lawyers off of counterintelligence work to do this. So a conscious act of resistance is possible, but it's a situation where mistakes are likely - people working very quickly trying to meet a deadline and doing work they aren't that familiar with and don't really want to be doing.
It seems like a common tactic by this administration is to just not do what they are required to do until they have been told 50 times and criminal charges are being filed. I suspect the actual truth here is 'don't do this' turned into 'you have 1 day to do this and keep my name out of the release' which led to lots of issues. They probably spent more time deciding the order of pages to release, and how to avoid releasing the things damaging to the administration, than actually doing the work needed to release it. Now they will say 'look, see! You didn't give us enough time and our incompetence is the proof'
Considering the Comey, James, and Adams debacles, seems quite likely they're purged most people with a shred of competence.
The 'resistance' was not releasing them during the last administration.
It's a good question.
For context, lawyers deal with this all the time. In discovery, there is an extensive document ("doc") review process to determine if documents are responsive or non-responsive. For example, let's say I subpoenaed all communication between Bob and Alice between 1 Jan 2019 and 1 Jan 2020 in relation to the purchase of ABC Inc as part of litigation. Every email would be reviewed and if it's relevant to the subpoena, it's marked as responsive, given an identifier and handed over to the other side. Non-responsive communication might not be eg attorney-client communications.
It can go further and parts of documents can be viewed as non-responsive and otherwise be blacked out eg the minutes of a meeting that discussed 4 topics and only 1 of them was about the company purchase. That may be commercially sensitive and beyond the scope of the subpoena.
Every such redaction and exclusion has to be logged and a reason given for it being non-responsive where a judge can review that and decide if the reason is good or not, should it ever be an issue. Can lawyers find something damaging and not want to hand it over and just mark it non-responsive? Technically, yes. Kind of. It's a good way to get disbarred or even jailed.
My point with this is that lawyers, which the Department of Justice is full of, are no strangers to this process so should be able to do it adequately. If they reveal something damaging to their client this way, they themselves can get sued for whatever the damages are. So it's something they're careful about, for good reason.
So in my opinion, it's unlikely that this is an act of resistance. Lawyers won't generally commit overt illegal acts, particularly when the only incentive is keeping their job and the downside is losing their career. It could happen.
What I suspect is happening is all the good lawyers simply aren't engaging in this redaction process because they know better so the DoJ had the wheel out some bad and/or unethical ones who would.
What they're doing is in blatant violation to the law passed last month and good lawyers know it.
There's a lot of this going on at the DoJ currently. Take the recent political prosecutions of James Comey, Letitia James, etc. No good prosecutor is putting their name to those indictments so the administration was forced to bring in incompetent stooges who would. This included former Trump personal attorneys who got improerly appointed as US Attorneys. This got the Comey indictment thrown out.
The law that Ro Khanna and Thomas Massey co-sponsored was sweeping and clear about what needs to be released. The DoJ is trying to protect both members of the administration and powerful people, some of whom are likely big donors and/or foreign government officials or even heads of state.
That's also why this process is so slow I imagine. There are only so many ethically compromised lackeys they can find.
Fine, but the teeth of this act belong to some future justice department. I predict Trump will issue blanket pardons for everyone involved, up to Bondi; and that none of them will respect a congressional subpoena.
2 replies →
> My point with this is that lawyers, which the Department of Justice is full of, are no strangers to this process so should be able to do it adequately. If they reveal something damaging to their client this way, they themselves can get sued for whatever the damages are. So it's something they're careful about, for good reason.
> So in my opinion, it's unlikely that this is an act of resistance. Lawyers won't generally commit overt illegal acts,
Political redaction in this release under the Epstein Transparency Act is an overt, illegal act.
Does that reconfigure your estimation of whether DoJ attorneys that aren't the Trump inner-circle loyalists installed in leadership roles might engage in resistance against (or at least fail to point out methodological flaws in the inplmentation of) it?
Its not a hack to copy and paste text that is part of the document data. The incompetence of the people responsible to comply with the law doesnt mean its reasonable to label something a hack.
Please change the title.
If I open your laptop and guess your password then that counts as hacking you in both legal and security terms
You don't need to do some sophisticated thing for it to be considered hacking
You guessing my password is not the same as a know and expected behavior of a program. Adobe has a specific feature to redact. PDF is a format known to have layers. Lawyers are trained on day one not to make this mistake. (I am a recovering lawyer). This is either incompetence or deliberate disclosure.
If you were blind would a screen reader read the documents? Thats not a hack.
2 replies →
If someone sends me a document with text in it that they meant to remove but didn't and then I read that text, I haven't hacked anything they're just incompetent.
Hacking is unauthorised use of a system. Reading a document that was not adequately redacted can hardly be considered hacking.
5 replies →
I’m not an attorney or anything, but the relevant federal statute is explicitly about unauthorized access of computer systems (18 USC 1030).
Opening someone else’s laptop and guessing the password would absolutely fall under that definition, but I think it’s very much questionable if poking around a document that you have legitimately obtained would do so.
But copying and pasting text of publicly released documents is not illegal. Accessing someone’s computer is illegal. While maybe it could fall under the umbrella of hacking in some general way, articles, and especially titles, should be more precise.
1 reply →
I guess but if you write something down real small and I squint at it is that still hacking?
It's being "undone with the lamest hack known to mankind."
Still technically a hack.
It’s not a hack. It’s known, expected behavior of the program. Adobe has a specific feature to redact. Color filled boxes is not it.
Hacking is any use of a technology in a way that it wasn’t intended. The redaction is so stupid as to almost appear intentional, so maybe you’re right, this isn’t hacking because maybe the information was intended to be discovered.
Yes, this is the digital equivalent of sticking a blank Post-it over text and calling it “redacted”. Mind-boggling that the same mistake has been made over and over again.
Also had this first thought, but then a hack could just be a way around a limit/lack of authorization, doesn't have to be unknown/sophisticated, so copy of black boxes fits
> limit/lack of authorization
By serving up the PDF file I am being authorized to receive, view, process, etc etc the entire contents. Not just some limited subset. If I wasn't authorized to receive some portion of the file then that needed to be withheld to begin with.
That's entirely different from gaining unauthorized entry to a system and copying out files that were never publicly available to begin with.
To put it simply, I am not responsible for the other party's incompetence.
8 replies →
And the title should briefly describe the “hack” as well
Not the only thing hack means now, or the most common usage anymore. See "life hack" - it means unexpected technique.
But this isn’t an unexpected technique it’s literally the core design of the pdf format. It’s a layered format that preserves the layers on any machine. Adobe has a redaction feature to overcome the default behavior that each layer can be accessed even if there is a top layer in front.
It's also the meaning used in the title of this very Web site.
Man if you can do this should keep it secret until they release more bad redactions...
Shout out to Stirling PDF that can be self hosted and has a relatively robust and easy to use redaction tool. All for free.... For now....
It's quite funny really. Apparently you just cut and paste the text into Word. They just had the pdf put black rectangles on top.
Why into Word specifically?
You have a better editor?
1 reply →
The average office worker has it on their computer, illustrating how commonplace unredacting could be. Any text tool will work, even some designed to detect bad redactions in PDFs via drag and drop (now specifically trained on these known bad redactions). https://github.com/freelawproject/x-ray
Why reveal the trick before all the papers have been released?
Someone wanted to make sure to be the first?
IKR?!
I don't think there is a grand conspiracy here. Any schmoe can download these files, select with their mouse, and copy paste into a document.
I "hacked" my facebook account the other day. I forgot my password and used the "forget password" link to gain access .
Not the first time; in 2005 the US report about Nicola Calipari's death in Baghdad was redacted (and unredacted by italian newspapers) in the same way.
Stupid question: why is the government even allowed to redact stuff? Isn’t the government keeping secrets from the people totally antithetical to democracy?
It's up to us to keep the government accountable. Democracy does if we don't put pressure on the government and participate actively in politics.
It's not the government, it's the department of justice. To name two: protection of witnesses, protection of state secrets ("the people" is not a person who can keep secrets).
Right, I’m aware of the excuses the government uses to keep secrets.
But on principle, what right does the government have to keep secrets from its own people? I don’t believe we had that button at the founding, it was added somewhere along the way. I’m asking what is the justification for this, and whether in the grand scheme of things that outweighs the principle of the government not being a separate entity from the people.
There are multiple ways to approach witness protection. For example if we have a problem with witnesses being harmed we could make being involved with witness harm at any layer of indirection a capital offense. We can probably think of other options besides the government being allowed to keep secrets from its own people.
2 replies →
Is the Department of Justice not a part of the government?
1 reply →
The TL;DR:
- To protect victims
- Redact people that are currently under investigation
But here they are clearly blacking out potential co-conspirators, without them being under investigation or having been charged with anything.
Seems like they are just backing out powerful people not to embarrass or implicate them.
Because some are allegations without proof, and some are names of people who are victims. They have a right to privacy
Because the redaction was only supposed to protect the victims.
Competence and possibility of malicious compliance are interesting questions, but I think the more appropriate question is if DoJ will be sued for violating the law by redacting unrelated content?
Apart from the technological and procedural question, I would love to learn why the DOJ found it important to protect Indyke. He was Epstein's lawyer, and now we learn that he was personally involved. He is not a Washington person. We expected there to be politically motivated protection of certain people, but is the DOJ just going to blanket protect anybody in the docs?
Indyke works for other powerful people, runs in MAGA circles.
Two things come to mind:
* Some things Indyke did fall outside the scope of lawyer-client privilege. It would be bad for certain people to get him on a stand and force him to spill the beans. He was never interviewed re: Epstein [1]
* He's a very talented lawyer, insofar as a competent lawyer with, at least, extreme discretion, is talented.
[1] https://www.finance.senate.gov/imo/media/doc/letter_to_doj-f...
He was Epstein’s lawyer, he almost certainly has the dirt on anyone the DoJ wants to protect, and may be the kind of person that would be inclined to burn whoever DoJ was protecting if he wasn't getting treatment at least as favorable.
All you have to do is work for a MAGA person or MAGA billionaire donor for them to protect you.
From TFA:
> [Indyke] was hired by the Parlatore Law Group in 2022, before the justice department settled the Epstein case. That firm represents the defense secretary, Pete Hegseth, and previously represented Donald Trump in his defense against charges stemming from the discovery of classified government documents stored at Trump’s Florida estate.
So I don't know about "not a Washington person", but clearly connections exist to the current administration.
He was probably considered as a "victim" of having his crimes exposed...
He’s one of the executors of Epstein’s will. Better not piss him off.
Print on paper. Physically cut out the pieces you want to send to remove. Scan.
Still suspect that someone can undo this from data may have been accidentally steaganographed across non-deleted parts of the image.
I think even after printing and scanning there could still be jpg artfacts from the original (e.g. if you scan lossless).
However, I wonder whether heavily compressing the redacted image would help remove any unwanted artefacts. But the best solution is probably to render the original file from scratch, without compression, before redacting the image.
Not sure but that might actually add your printer's unique dots to the scanned image.
https://en.wikipedia.org/wiki/Printer_tracking_dots
Microdots may leak your identity this way (though I guess a really high resolution scan is needed for that)
It's no problem if they leak the fact that an FBI office printer was used to print the documents the FBI released.
i wouldn't trust any of these "undo's"
This is probably just pure stupidity, but part of me hopes there is some tech person in there who knew exactly what they were doing. I’d take a job as a tech person in this administration just to sabotage stuff like this.
What is the proper way to do this? I see a couple suggestions in the comments:
1. Draw a black box over it in image editor, save a screenshot
2. Crop the info out
Are there other good ways?
PDFs do have a "burn and destroy the parts/layers below" as part of the spec meant explicitly for redaction like this. Apparently they didn't use it, I guess?
A mafia state puts loyalists on top and can't produce anything ( smart people leave) and smart people who think for their own can't be promoted.
That's also why a mafia extorts and doesn't run complex businesses in general.
Perhaps the US can survive this administration. But somewhere down the line it will become broken.
The non-complex mafia businesses is moot since the 50ies already. They run Vegas, most of big sports leagues, politics, secret services and restaurant chains. Everything which can effectively wash money.
when i first saw this, i thought it was a meme. There is no way the DOJ could be so incompetent to fumble their own cover up.
There is a book by Richard Dawkins- I am me I am free or something like that, and it has a main picture of Richard standing naked and having a private part being covered by black rectangle but somehow my laptop back then was slow and when you scrolled it would temporary remove the square for a split second
Are you sure? I can't find any trace of any book by Richard Dawkins with a title much like that, and that doesn't seem like a very on-brand sort of cover pic for a book by him, and an image search for "Richard Dawkins book cover" doesn't turn up anything like it.
Most likely "I Am Me, I Am Free: The Robot's Guide to Freedom. - David Icke"
2 replies →
Let's nobody make any fuss about this yet, lest they wise up before releasing the rest of the docs this way too!
Part of me wonders whether they had some of the text under the "redactions" changed too.
How it’s done from technical point?
Layers.
PDF is an absurdly complex file format. It's part of the reason there is no single "good" PDF reader, just a lot of mediocre PDF readers that are all terrible in their own way. Which is a topic for another day.
There are several ways to remove data in a PDF:
- Remove the data. This is much harder than it sounds. Many PDF tools won't let you change the content of a PDF, not because it isn't possible, but because you'll likely massively screw up the formatting, and the tools don't want to deal with that.
- Replace the data. This what what all the "blackout" tools do, find "A" and replace with "🮋". This is effective and doesn't break formatting since it's a 1-to-1 replacement. The problem with "replacing" is that not every PDF tool works the same way, and some, instead, just change the foreground and background color to black; it looks nearly the same, but the power of copy-and-paste still functions.
- Then you have the computer illiterate, who think changing the foreground and background color to black is good enough anyway.
This seems highly misleading.
> - Remove the data. This is much harder than it sounds. Many PDF tools won't let you change the content of a PDF, not because it isn't possible, but because you'll likely massively screw up the formatting, and the tools don't want to deal with that.
Compared to other formats this is actually relatively easy in a PDF since the way the text drawing operators work they don't influence the state for arbitrary other content. A lot of positioning in a PDF is absolute (or relative to an explicitly defined matrix which has hardcoded values). Usually this makes editing a PDF harder (since when changing text the related text does not adapt automatically), but when removing data it makes it much easier since you can mostly just delete it without affecting anything else. (There are exceptions for text immediately after the removed data, but that's limited and relatively easy to control.)
> - Replace the data. This what what all the "blackout" tools do, find "A" and replace with "🮋". This is effective and doesn't break formatting since it's a 1-to-1 replacement.
That's actually rather tricky in PDFs since they usually contain embedded subset fonts and these usually do not have "🮋" as part of the subset. Also doing this would break the layout since "🮋" has a different width than most letters in a typical font, so it would not lead to less formatting issues than the previous option. Unless the "🮋" is stretched for each letter to have the same dimensions, but then the stretched characters allow to recover the text.
> The problem with "replacing" is that not every PDF tool works the same way, and some, instead, just change the foreground and background color to black; it looks nearly the same, but the power of copy-and-paste still functions.
PDF does not have a concept of a background color. If it looks like a background color in PDF, you have a rectangle drawn in one color and something in the foreground color in front of it. What you usually see in badly redacted PDF files is exactly this, but in opposite color: Someone just draws a black box on top of the characters. You could argue that this is smarter since it would still work even if someone would chnage colors, but of course, PDF is a vector format. If you just add a rectangle, someone else can remove it again. (And also copy & paste doesn't care about your rectangle)
>- Remove the data. This is much harder than it sounds. Many PDF tools won't let you change the content of a PDF, not because it isn't possible, but because you'll likely massively screw up the formatting, and the tools don't want to deal with that.
>- Replace the data. This what what all the "blackout" tools do, find "A" and replace with "🮋". This is effective and doesn't break formatting since it's a 1-to-1 replacement. The problem with "replacing" is that not every PDF tool works the same way, and some, instead, just change the foreground and background color to black; it looks nearly the same, but the power of copy-and-paste still functions.
You're making it sound way harder than it is, when both adobe acrobat and the built-in preview app on mac can both competently redact documents. I'm not aware of instances of either (or any other purpose-made redaction tools) failing. I wouldn't homebrew a python script to do my redaction either, but that doesn't mean doing redactions properly in some insurmountable task for some intern.
1 reply →
> Then you have the computer illiterate, who think changing the foreground and background color to black is good enough anyway
To be fair, this works if you print out those copies and then re-scan them.
Thanks for this. Really quells the urge I get every so often to just code my own PDF editor, because they all suck and certainly it couldn't be THAT hard. Such hubris!
10 replies →
qpdf has a redaction option. It’s routinely used to anonymize medical records for studies.
I remember reading the recommendation for journalists to redact documents is to black them out in the digital version, print it out, and re-scan it. Anything else has too many potential ways by which it might be possible to smuggle data.
Even that might leak to length attacks: one reasonable plaintext would lead to black bars of 1135 px, another to 1138 px, and with enough redactions you can converge on what the plaintext might be.
The only safe way for journalists is to paraphrase what the document said and to say "an unnamed source claims that ..." and to guarantee with your reputation, and the reputation of your publisher, that you are being faithful to what the original source said. For even better results, combine multiple sources.
Unfortunately paraphrasing things and taking editorial responsibility have both been deprecated in favour of rereleasing press releases in the house style, so it's difficult to get the actual journalism these days.
1 reply →
Mistaking redaction tool (replaces data with black square) and black highlighter (adds black square as another layer). If people doing redactions are computer-illiterate, they won't see the difference.
They drew black boxes over the text. The text is still underneath. On OCR'd scanned documents, the text you'd copy is actually stored in metadata and just linked by position to the image.
Anyway, if you click on a "redaction", you're clicking on the box and can't select the text underneath, but if you just highlight the text around it, you can copy all the original text.
It's a bizarre oversight.
PDF is less like an image, and more like a web page where elements can be stacked on top of each other. You can visually obscure things by sticking a black rectangle over the top, but anyone who inspects inside the pdf can remove it or see the text in the source.
There would also be a mix of text documents, and image scans. The way to censor each is different.
Perfectly censoring documents, particularly digital ones is actually surprisingly difficult.
> Perfectly censoring documents, particularly digital ones is actually surprisingly difficult.
But the difficult part is easily repeatable once it's figured out, which is why it surprises me that it's not built into Acrobat as a tool already.
1 reply →
reminds me of that leaky redaction program that won the obfuscated c contest some years back
Probably the Underhanded C Contest (https://www.underhanded-c.org/_page_id_17.html) but yeah. Obfuscated C Contest entries usually aren't underhanded, just intentionally obscure about what they do or how they do it.
sorry, yes, that one.
Great contest. And a great entry, I had a big chuckle running it and unredacting my documents, even photos!
Can you post the document numbers, I can't find where these texts are in the original pdfs.
ah, found it - this is from the 'Court Records' part.
https://www.justice.gov/multimedia/Court Records/Matter of the Estate of Jeffrey E. Epstein, Deceased, No. ST-21-RV-00005 (V.I. Super. Ct. 2021)/2022.03.17-1 Exhibit 1.pdf
Link broken. (See reply with link that works.)
AKA https://www.justice.gov/multimedia/Court%20Records/Matter%20...
I wonder if it's purposeful misdirection
Doesn't work on any PDF's of scanned documents , for example the contacts list.
Copying and pasting doesn't work. Unless your PDF viewer does OCR. And if the redaction is just a black rectangle overlaid on top, that can still be removed.
I love how the entire internet thinks that this is a big deal when all that happened is that USDOJ re-posted some poorly-redacted court documents that were poorly redacted by non-USDOJ attorneys more than three years ago.
Yes, USDOJ is incompetent and dysfunctional, but this is not why. But sure, whatever, carry on...
Ctrl-c and ctrl-v are not hacks.
They are unredacted because either those in charge are not familiar with basic office tasks, or someone wanted this stuff to leak and nobody checked thier work. Either brand of incompetance should cause heads to roll. But, just like the signal fiasco, nothing will happen. When your brand is perfection, you cannot ever admit a mistake.
There are people here that would still vote for these evil people.
If you think mere human incompetence with documents is bad, imagine all the vibe coded apps.
See also: https://x.com/FaytuksNetwork/status/2003237895897780632
Am I crazy or didn't the same thing happen with Epstein's phone book some years ago? Coincidence?
Alright, now when everyone knows this. I hope people have backed up all the files to unredact everything before DOJ retracts the sensitive documents.
Lots of these redaction doesn't make sense unless they're made to protect the rich and powerful. Not surprising of course.
See also:
We Just Unredacted the Epstein Files
https://news.ycombinator.com/item?id=46364121
I tried to ascertain, but am not certain, this is the original blog source. Maybe they made some prior X posts.
It has become more plausible that nothing of value was released and the level of obviously poor redaction was done as a tarpit to own the libs.
So is the data extracted the names of the victims that were supposed to be hidden to protect them? Or is there something else that might be worthy of exposing?
It seems the redactions are to protect the perpetrators.
I'm seeing, for example, "Hyperion Air, Inc" was redacted.
Victim?
There are pages that are nothing but redacted text. It isn’t going to be a victims name copy pasted 80 times in a row…
>It isn’t going to be a victims name copy pasted 80 times in a row…
You can't possibly know that!
(Sorry, watching Grinch, Jim Carrey spoke through me).
i assume the downvoters don't see the importance of the question.
The downvoters assume that it is a bad faith question. The downvoters are 99% right with that. If the 1% hit then OP is just exceedingly naive and did not followed the scandal in which case they should maybe first do some reading.
The names of involved powerful people were NOT supposed to be censored. All those names except Bill Clinton name were redacted. To protect Trump and everybody else involved in the scandal except said Bill Clinton. But especially to protect Trump.
3 replies →
[dead]
ah yes, “hacks”
Trump's razor: Why attribute something to incompetence when you can attribute it to patriotic sabotage?
There's no patriotism here. That's just part of the cover for seeking power.
There's no patriotism in protecting chomos.
It's certainly possible that some of the underlings are deliberately sabotaging orders from above. It's also possible that they're incompetent, as so many of the Trump team are. How would we know which it is?
[dupe] https://news.ycombinator.com/item?id=46364121
We'll merge those comments hither.
[dead]
Did we learn anything useful or is it exactly as I said in the other thread, which got downvoted to hell, that all the really juicy blackmail material is with the CIA and will never see the light of day?
Won't know until all the documents are released. The blackmail is undeniable. But what's more interesting is who else was involved. Who purchased his services? That's what they are trying to hide.
Do you have any evidence of that?
Of course they don't but it sounds truthy so give it a few rounds of the Internet whisper machine and it can become accepted fact everybody "knows".
hacks :facepalm:
"hacks"
copy and paste people, the idiots have taken over
This site has really gone downhill lately with drivel like this being upvoted. Any real developers on this site anymore?
Regardless of the content itself, naive redaction of a high profile PDF still exposing the text contents is something that seems relevant to the community. Maybe you are in the wrong place?
“Like you guys have had this stuff for a year. Doesn’t it seem like you could just throw all that into AI at this stage of the game? And just redact the names of the victims, and let’s go.” Joe Rogan
"hacks" lol. Next, ctl+alt+del and it's equivalents are gonna be called arcane theurgy
Hacks don’t have to be pretty — if it works it works. Here’s my “hack” to get into many school computer systems:
Username: admin
Password: password
it's even less impressive; somebody left the credentials typed into the text boxes and went to get a slimfast out of the staff breakroom and you walked into the computer lab and hit enter.
I think this is a good thing. I think the people talking dictator this and that do not understand we have the ability to critique the administration. What we lack is control of the underhanded lobbyism. It is a warped democracy but still a democracy.
You sure about that? https://www.usatoday.com/story/news/2025/12/18/larry-bushart...
Every slide towards authoritarianism is gradual, there is no announcement.
Bruh they're kidnapping people in the streets. They took over CBS and censored a documentary about CECOT.
Warped democracy.