Comment by anigbrowl

18 days ago

I found this part interesting:

There are also other documents that appear to simulate a scanned document but completely lack the “real-world noise” expected with physical paper-based workflows. The much crisper images appear almost perfect without random artifacts or background noise, and with the exact same amount of image skew across multiple pages. Thanks to the borders around each page of text, page skew can easily be measured, such as with VOL00007\IMAGES\0001\EFTA00009229.pdf. It is highly likely these PDFs were created by rendering original content (from a digital document) to an image (e.g., via print to image or save to image functionality) and then applying image processing such as skew, downscaling, and color reduction.

52 comments

anigbrowl

tombrossman 18 days ago

GNOME Desktop users can put this in a Bash script in ~/.local/share/nautilus/ for more convincing looking fake PDF scans, accessible from your right-click menu. I do not recall where I copied it from originally to give credit so thanks, random internet person (probably on Stack Exchange). It works perfectly.

  ROTATION=$(shuf -n 1 -e '-' '')$(shuf -n 1 -e $(seq 0.05 .5))

  for pdf in "$@";
    do magick  -density 150 $pdf \
              -linear-stretch '1.5%x2%' \
              -rotate 0.4 \
              -attenuate '0.01' \
              +noise  Multiplicative \
              -colorspace 'gray' \
              "${pdf%.*}-fakescan.${pdf##*.}"
  done

barrkel 18 days ago

That seq is probably supposed to be $(seq 0.05 0.05 0.5). Right now it's always 0.05.
Note that you can get random numbers straight from bash with $RANDOM. It's 15 bit (0 to 32767) but good enough here; this would get between 0.05 and 0.5: $(printf "0.%.4d\n" $((500 + RANDOM % 4501)))
streetfighter64 18 days ago
Shouldn't $ROTATION be set inside the loop and actually used in the magick command?
- tombrossman 18 days ago
  
  You know, now that you point it out that seems obvious. I think maybe I was experimenting with rotation and left that in, unused. I did this years ago. The loop works OK though. Thanks for the feedback (and now I have to finish editing that script ...)
lordgrenville 18 days ago
Nothing about this is specific to GNOME, right? Imagemagick is cross-platform
- turboponyy 18 days ago
  
  I guess the Gnome-specific part is that Gnome comes with the Nautilus file browser, and the instructions add a script for Nautilus.
  But yea, this will work as long as you have imagemagick and Nautilus installed.
  
  2 replies →
mimischi 18 days ago

I like https://lookscanned.io/
landdate 17 days ago
[flagged]
- taskforcegemini 17 days ago
  
  you sound as grumpy as my cat looks. there's no need for this language
- landdate 17 days ago
  
  [flagged]

nullbio 18 days ago

The real question is: Which of the documents are the ones that are "simulating" scanned documents, and what political narrative do they reinforce?

The only reason I can think of for why someone would want to do this is to pass off fraudulent or AI generated images as real.

boromisp 18 days ago

A simpler explanation could be wanting to skip the print->sign->scan ceremony required by some institutions.
reactordev 18 days ago

This. Slip in a few thousand “fakes” with the trove of goods to be able to fabricate a narrative.
lucideer 17 days ago

Another explanation is that it's simply one form of lazy ineffective obfuscation performed by inexperienced relative luddites in an attempt to walk the fine line between complying with the supreme court directive & not releasing anything useful.
Other investigations into the files have found oddities like redaction of the word "don't" indicating a haphazard find-&-replace approach to redaction, possibly LLM-aided.
The DOJ/Akamai online hosted search feature is also incomplete - potentially due to some of these "digitally scanned" files not being subject to OCR.
lucideer 17 days ago

> to pass off fraudulent or AI generated images as real.
Possibly but I don't find it compelling, if only because a significant portion of the media reportage on the files has made claims that are entirely baseless - if there were a narrative to be sold one would expect such reportage to be actively leveraging such fraudulent images.

streetfighter64 18 days ago

Very interesting. That document in particular seems to be an interview of A. Acosta by the DoJ from 2019. But what reason would the FBI have for pretending it's a scanned document, if it is genuine? Perhaps there's some aspect of Epstein's deal with Acosta that they'd rather not reveal to the public?

https://www.justice.gov/epstein/files/DataSet%207/EFTA000092...

juujian 18 days ago
Not that I can speak from personal experience or anything... But somebody on an email chain may have requested a scanned version of the document to ensure there is no metadata and the employee might have found it easier to just flatten the pdf and apply a graphical filter to make the document appear like a scanned document. There might even be a webtool available somewhere to do so, I wouldn't know...
- agopo 18 days ago
  
  [dead]
  
  1 reply →
- mikkupikku 18 days ago
  
  > the employee might have found it easier to just flatten the pdf and apply a graphical filter to make the document appear like a scanned document
  Is that remotely plausible? I can't imaging faking a scan being easier than just walking down the hall to the copier room.
  
  21 replies →
breppp 18 days ago

I am only guessing that they had to remove the document from a classified network in a way where data won't possibly leak
draw_down 18 days ago

[dead]

zoky 18 days ago

Such a weird way to do it when it would be a vastly easier to just blow the document out to paper and re-scan it.

brazzy 18 days ago
Vastly easier when you do it to one or a handful of documents.
But if you want to do it to 2000 documents...
- fc417fc802 17 days ago
  
  But at that point why bother with the fakery? Why does it matter if it's obviously of digital origin? As long as it's rendered down to an image problem solved.
  Was the motivation for this benign (an employee skirting regulations) or malicious?
- pbhjpbhj 18 days ago
  
  4 reems (4×500) is hardly a lot for commercial equipment to handle - paper trays will take a reem at a time. Document analysis would still show some shenanigans were in play, but you'd get a bit of variation at least.
userinexperienc 18 days ago

[dead]

hiccuphippo 18 days ago

I mean, I do that all the time when they ask me to print something, sign it, and then scan it.

Sign a blank paper, scan it, paste the original doc on it. Then keep the scan for future docs.

foxglacier 18 days ago

An easier trick I've used is just sign directly on the computer screen over the displayed document with a whiteboard marker and take a photo with my phone.