← Back to context

Comment by mrb

4 years ago

I have a shell script based on ImageMagick that gives a PDF a "scanner" look. I typically open the PDF in Master PDF Editor to insert an image of my signature, then pass it through my script. When I do need it, it's rare, but it becomes a real life saver. It has avoided me the need to print and scan 100+ pages for a mortgage company, some stock brokers and banks. Key points of the script:

"+noise Random -fill white -colorize 95%" to add some noise to the image

"-distort ScaleRotateTranslate '$x,$y $angle'" to randomly shift horizontally and vertically the document, and randomly rotate it slightly

"-density 150" for a low-ish resolution so it better hides the fact the PDF wasn't really scanned

"-colorspace Gray" to make it black & white

"-quality 60" to increase JPG compression and somewhat reduce picture quality

  #!/bin/bash
  # Make a pdf look like it was scanned.
 
  if [ $# -ne 2 ]; then
      echo "Usage: $0 input output" >&2
      exit 1
  fi
  tmp="$1".scanner-look.tmp
  mkdir "$tmp" &&
  # without -flatten some PDF convert to a JPG with a black background
  convert -density 150 "$1" -colorspace Gray -quality 60 -flatten "$tmp"/p_in.jpg &&
  : || exit 1
  # each page is randomly shifted in the X and Y plane.
  # units seem to depend on angle of rotation in ScaleRotateTranslate?
  offset() { echo $(($RANDOM % 1000)); }
  for f in "$tmp"/p_in*jpg; do
      # each page is randomly rotated by [-0.5 .. 0.5[ degrees
      angle=$(python -c 'import random; print(random.random()-0.5)')
      x=$(offset)
      y=$(offset)
      convert "$f" \
        -blur 0x0.5 \
          -distort ScaleRotateTranslate "$x,$y $angle" +repage \
        \( +clone +noise Random -fill white -colorize 95% \) \
        -compose darken \
        -composite \
        ${f/p_in/p_out}.pdf || exit 1
  done
  # concatenate all the pages to one PDF
  # use "ls -v" to order files correctly (p_out-X.jpg where X is 0 1 2 ... 9 10 11 ...)
  pdftk $(ls -v "$tmp"/p_out*.pdf) cat output "$2" &&
  rm -rf "$tmp"

I have a script for the same purpose too, but I prefer a black-and-white 1-bit palette for that fax look. Here's my version -- note that it uses graphicsmagick, img2pdf, optipng, and pdftk. Also enforces A4 so some of you may want to change that. For fun it's doing the page processing in parallel to speed up a bit with large documents.

    #!/bin/bash

    # Adds a bad scanning effect to PDF files.

    if [ $# -ne 2 ]; then
      echo 1>&2 "Usage: $0 input.pdf output.pdf"
      exit 3
    fi

    convertPage() {
      # PDF filename in first parameter, page in second
      file=$1
      page=$(($2-1))
      png=$(printf "pdf2scan-page-%05d.png" $2)

      # Convert PDF page to black and white PNG
      gm convert -density 300 "$file"[$page] +dither -rotate 0.35 +noise Gaussian -type bilevel -fill white -fuzz 90% -colors 2 $png

      # Optimize PNG
      optipng -silent $png
    }

    export -f convertPage

    # Read number of pages
    pages=$(pdftk "$1" dump_data | grep NumberOfPages | sed 's/[^0-9]*//')

    # Loop through pages and convert in parallel
    for i in $(seq 1 $pages)
    do
      echo "$1":::$i
    done | parallel --eta --colsep ':::' convertPage {1} {2}

    # Create PDF from PNGs
    img2pdf -o "$2" --producer "" --pagesize A4 pdf2scan-page-*.png

    # Remove temporary files
    rm pdf2scan-page*

For a cleaner 1-bit look without noise and rotation, use "gm convert -density 300 "$file"[$page] +dither -colors 2 -type bilevel -fill white -fuzz 40% $png".

  • The 1-bit palette is a good touch. Making it use parallel(1) is a great and easy optimization. Nice!

Kind-of-related: I'm wondering if anyone can help me find a website I found a long time ago (probably through StumbleUpon, if that tells you anything about how long ago)

It was a "government document simulator." What you would do is upload a nicely scanned document, and it'd give you back a mis-alighed, crappy quality "scan" of that document, with random blotches and other visual noise. You know, like regular government/FOIA-received documents.

I feel like this is halfway there, if not more (so thank you!), but that website was so authentic.

I don't know if it's even around, but it made me giggle, and I'd like to find it again. If not--great startup idea!

Thanks for this!

"-flatten" results in all PDF pages being rendered into a 1 page PDF output. If "-flatten" is removed, I get a multi-page PDF output as expected. Thoughts?

EDIT: "-flatten" does what it is supposed to. Delete if operating on multipage PDF.

  • Weird. I could swear "-flatten" didn't behave like this years ago when I last used my script. But maybe I am misremember...

    Edit: haha! The "-flatten" needs to be replaced with "-alpha flatten". This way, multi-page documents are still handled correctly, and alpha transparency is also handled correctly. I just tried on this sample file with transparent images: https://tcpdf.org/files/examples/example_042.pdf

    • Changing "-flatten" to "-alpha flatten" (without the double quotes) results in an error for me.

      > convert: UnrecognizedAlphaChannelOption `flatten' @ error/convert.c/ConvertImageCommand/673.

      1 reply →