Comment by layer8
1 day ago
These PDFs apparently used the “incremental update” feature of PDF, where edits to the document are merely appended to the original file.
It’s easy to extract the earlier versions, for example with a plain text editor. Just search for lines starting with “%%EOF”, and truncate the file after that line. Voila, the resulting file is the respective earlier PDF version.
(One exception is the first %%EOF in a so-called linearized PDF, which marks a pseudo-revision that is only there for technical reasons and isn’t a valid PDF file by itself.)
New OSINT skill unlocked
I see an interesting parallel to how people think about captured encrypted data, and how long that encryption needs to be effective for until technology catches up and can decrypt (by which point, hopefully the decrypted data is worthless). If all of these documents are stored in durable archives, future methodologies may arrive to extract value or intelligence not originally available at the time of capture and disclosure.
> If all of these documents are stored in durable archives, future methodologies may arrive to extract value or intelligence not originally available at the time of capture and disclosure.
I recently learned that some people improve or brush up on their OSINT skills by trying to find missing people!
It's hilarious the extent to which Adobe Systems's ridiculously futile attempt to chase MS Word features ended up being the single most productive espionage tool of the last quarter century.
I don’t think this was particularly modeled on MS Word. The incremental update feature was introduced with PDF 1.2 in 1996. It allows to quickly save changes without having to rewrite the whole file, for example when annotating a PDF.
Incremental updates are also essential for PDF signatures, since when you add a subsequent signature to a PDF, you couldn’t rewrite the file without breaking previous signatures. Hence signatures are appended as incremental updates.
PDF files are for storing fixed (!!) output of printed/printable material. That's where the format's roots are via Postscript, it's where the format found its main success in document storage, and it's the metaphor everyone has in mind when using the format.
PDFs don't change. PDFs are what they look like.
Except they aren't, because Adobe wanted to be able to (ahem) "annotate" them, or "save changes" to them. And Adobe wanted this because they wanted to sell Acrobat to people who would otherwise be using MS Word for these purposes.
And in so doing, Adobe broke the fundamental design paradigm of the format. And that has had (and continues to have, to hilarious effect) continuing security impact for the data that gets stored in this terrible format.
4 replies →
I'm pretty sure you can change various file formats without rewriting the entire file and without using "incremental updates".
19 replies →