Comment by quotemstr

5 days ago

:-) I'll never quite appreciate why people say things like this. Having some kind of embedded scripting is useful for all sorts of things, often form validation. A sufficiently complex validation system becomes Turing complete, so you might as well skip the hassle of a custom language and go right to JavaScript. Once you have JavaScript, input, and some way of updating a graphical pixel grid, you're at Doom-completeness. I think it's a wonderful, not terrible, thing that computation and programmability are so cheap they've become ubiquitous even in the most mundane applications

We had that language, it was postscript.

Then pdf came along and said: no this is too dangerous the only thing in a document should be layout information not arbitrary code.

And here we are two decades later.

My hatred of pdf has no end. It killed postscript for dynamic pages and djvu for static pages.

  • This is very misleading thinking. We've came a very long way from PS security-wise and this is a good thing and should be appreciated.

    The fallacy I see in many comments - either directly or between the lines - is to think that since we can run Doom in PDF, hell's gates must have opened and we can do literally anything, especially anything malicious.

    This is not the case.

    PDF is basically comprised of immutable parts and interactive elements that user agents are supposed to render visibly distinctly. Also user agents are not supposed to run any code without explicit user interaction.

    Contemporary user agents do a good job in both respects.

    PDFtris and the Doom example are possible because they live in a very small niche of features that enable relatively unobtrusive still interactive form processing. Forms allow code, but do not stick out as much as other interactive elements do and they are relatively flexible. Having found that feature niche is the real genius of PDFtris and related exploits.

    Still, they need user interaction. There is no way to do anything behind your back in PDF.

    Another fallacy I see in this and the related threads,is that Adobe Acrobat vulnerabilities are PDF vulnerabilities. Yes, Adobe did a terrible job with Acrobat, but in my opinion not at all with the format and specification of PDF - especially not when it comes to security.

  • > And here we are two decades later.

    The conclusion to draw from this is that the hypothesis "the only thing in a document should be layout information not arbitrary code." is wrong and misguided, since whatever the format is, in the end "nature" (us) will make it evolve in a way that has some amount of arbitrary scriptability ; if it's not JS in PDFs it will be ActiveX controls, a government-mandated proprietary app, having to do a trip to the city hall to have the clerk play an algorithm step-by-step by hand, or something else, but something will always eventually come up to fill that void and you will have to use it whether you like it or not.

  • > My hatred of pdf has no end. It killed postscript for dynamic pages and djvu for static pages.

    Interesting to see someone evoke DjVu.

    With the exception of IW44 wavelet compression, basically everything the DjVu file format supports has a PDF equivalent. I built a tool to convert DjVu to PDF that preserves the image layers and file structure with nearly equivalent compression.

    My tool did expose some edge cases in the PDF standard which was frustrating. For instance, PDF supports applying a bitonal mask to an image, but it does not specify how to apply it if the two images have different resolution (DPI). It took many years to get Apple to bring their implementation into consistency.

  • This is a very concise explanation, thanks for putting it so clearly. It’s not the features or requirements that are the focus of the scorn, per se, but how we got here. I still prefer and use PDF all the time, but between overly dynamic crap and the mainstream tooling, well… “hate” is a reasonable hyperbole.

    • Hate is too weak a term for what I feel for Adobe.

      Adobe kept PDF as a proprietary format from 1992 to 2008. You got the reader for free ... on windows, with a single executable. You didn't get an editor and had to pay through the nose for one from Adobe.

      It wasn't until the late 2010s that it actually became a free-ish standard, if you think that a 3,500 page document is a 'standard'.

      The only reason why adobe did it is because djvu was eating their lunch, between 2002 and 2008 it was the defacto standard for scanned documents in academia. The documents were easy to edit. The image compression is still better than the native compression on PDF.

      To add insult to injury after displacing postscript on windows in the name of security, not only did they add a scripting language to PDF, they added one written in two weeks at a time when it was so bad no one used it for anything but pop-ups and with more security vulnerabilities than you could shake a stick at. I suppose we should be happy Adobe didn't put flash in. Oh wait, they did: https://www.reddit.com/r/Adobe/comments/yqisho/flash_content...

JS is what made these file types into the Pretty Dangerous Format. Numerous vulnerabilities in Adobe Acrobat surfaced thanks to the embedded JS engine.

Updating the Acrobat client across an enterprise used to be quite burdensome.

  • The flip side is that because the industry has converged on just a few embedded scripting systems (JS, Lua, etc.) we can concentrate our security hardening efforts on these few engines and benefit everyone. If PDF, like PostScript, were its own custom thing, it couldn't have been able to benefit from this hardening. In the end, JS was a fine choice.

    • The concern isn't that it was JS, the concern is that there's a scripting system inside of PDF at all. Why? What? Form validation is a lousy excuse because forms themselves were a bridge too far for the format. Why do we need to be able to validate them?

      I knew PDFs could be dangerous, but I didn't realize it was because they're intentionally designed to allow embedded scripts.

      3 replies →