Comment by sieste

7 hours ago

Due to pdf popularity there is a lot of demand for pdf processing tools. And the format is so complex that there are many nontrivial and creative ways to do pdf processing. That's why these "Hello World" projects usually make Top 5 on HN, and one of the upvotes is usually from me.

>many nontrivial and creative ways to do pdf processing

They're all wrapping PDFlib and provide the same functionality.

  • I am already well served by ghostscript, GIMP, Imagemagick, etc:

    Optimize PDF:

        #!/bin/bash
        INPUT="$1"
        OUTPUT="$(mktemp --suffix=.pdf)"
        gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/ebook \
        -dNOPAUSE -dQUIET -dBATCH -sOutputFile="$OUTPUT" "$INPUT"
        mv "$OUTPUT" "$INPUT"
    

    Merge PDF:

        #!/bin/sh
        gs -q -dNOPAUSE -dBATCH -sDEVICE=pdfwrite \
          -dCompatibilityLevel=1.3 -dPDFSETTINGS=/ebook \
          -sOutputFile=merged.pdf "$@"
    

    And so on and so forth.

    Moreover, I see a webapp and I immediately assume everything I do in this app is exfiltrated and abused.

    I can check that the webapp advertised above is indeed local-first, but I can't be 100% sure they don't steal my data in a way I did not foresee, e.g. via websockets or cookies.

    Because I learnt this the hard way by being on Instagram and Gmail.

    • Your commands to process PDF with Ghostscript are lossy (they lose lots of metadata and in minor ways they also change how the PDF renders), and they produce very large PDF files.

      1 reply →

    • To better compress my personal preference is

          pdftops -paper A4 -expand -level3 file.pdf # I'm from EU, so A4 is my common paper format
      
          ps2pdf14 -dEmbedAllFonts=true        \
          -dUseFlateCompression=true           \
          -dOptimize=true                      \
          -dProcessColorModel=/DeviceRGB       \
          -r72                                 \
          -dDownsampleGrayImages=true          \
          -dGrayImageResolution=150            \
          -dAutoFilterGrayImages=false         \
          -dGrayImageDownsampleType=/Bicubic   \
          -dDownsampleMonoImages=true          \
          -dMonoImageResolution=150            \
          -dMonoImageDownsampleType=/Subsample \
          -dDownsampleColorImages=true         \
          -dColorImageResolution=150           \
          -dAutoFilterColorImages=false        \
          -dColorImageDownsampleType=/Bicubic  \
          -dPDFSETTINGS=/ebook                 \
          -dNOSAFER                            \
          -dALLOWPSTRANSPARENCY                \
          -dShowAnnots=false                   \
            file.ps compressed.pdf

      1 reply →

    • You're being downvoted because not everyone has CLI access to a server and the required ghostscript binaries etc.

      Realistically, most 'normal users' have PDF needs like these links and we as tech people can safely give these sites to non-technical people and have confidence their data isn't being stolen on remote dodgy servers (think gas / electricity bills, invoices, bank statements etc which is a PII gold pot).

      1 reply →