Comment by gu009

12 hours ago

A handy side use for this is compressing PDFs.

For some reason, printing 1 page of an Excel or Word document to a PDF often gets up to around 4MB in size. Passing it through this compresses it quite well.

Just ran a quick test:

- 1-page Excel PDF export: 3.7MB

- Processing with Dangerzone (OCR enabled): 131KB

I wonder if the Excel export is retaining a lot of document structure in the event that it's imported back into Excel again at a later point.

  • Fun trivia: XLSX, DOCX, PPTX are just XML files, you can rename them to ".XML" file extension, and open them in notepad to see their raw contents.

    But you can use qpdf or PDFEdit to interpret a PDF's raw code.

    https://stackoverflow.com/a/6562443

    And thus, you can compare the raw XLSX (XML) vs raw PDF.