Comment by gu009
10 hours ago
A handy side use for this is compressing PDFs.
For some reason, printing 1 page of an Excel or Word document to a PDF often gets up to around 4MB in size. Passing it through this compresses it quite well.
Just ran a quick test:
- 1-page Excel PDF export: 3.7MB
- Processing with Dangerzone (OCR enabled): 131KB
I wonder if the Excel export is retaining a lot of document structure in the event that it's imported back into Excel again at a later point.
Fun trivia: XLSX, DOCX, PPTX are just XML files, you can rename them to ".XML" file extension, to see their raw contents.
But you can use qpdf or PDFEdit to interpret a PDF's raw code.
https://stackoverflow.com/a/6562443
And thus, you can compare the raw XLSX (XML) vs raw PDF.