Comment by ogurechny

1 month ago

It started in the '80s. PostScript was the big deal. It was a printer language, not a document language. It was not limited to “(mostly) text documents”, even though complex vector fonts and even hinting were introduced. For example, you could print some high quality vector graphs in native printer resolution from systems which would never ever get enough memory to rasterise such giant bitmaps, by writing/exporting to PostScript. That's where Adobe's business was. See also NeWS and NeXT.

However, arbitrary non-trivial PostScript files were of little use to people without a hardware or software rasteriser (and sometimes fonts matching the ones the author had, and sometimes the specific brand of RIP matching the quirks of authoring software, etc.), so it was generally used by people in publishing or near it. PDF was an attempt to make a document distribution format which was more suitable to more common people and more common hardware (remember the non-workstation screen resolutions at the time). I doubt that anyone imagined typical home users writing letters and bulletins in Acrobat, of all things (though it does happen). It would be similar to buying Photoshop to resize images (and waiting for it to load each time). Therefore, competitor to Word it was not. Vice versa, Word file was never considered a format suitable for printing. The more complex the layout and embedded objects, the less likely it would render properly on publisher's system (if Microsoft Office did exist for its architecture at all). Moreover, it lacked some features which were essential for even small scale book publishing.

Append-only or versioned-indexed chunk-based file formats for things we consider trivial plain data today were common at the time. Files could be too big to rewrite completely each time even without edits, just because of disk throughput and size limits. The system could not be able to load all of the data into memory because of addressing or size limitations (especially when we talk about illustrations in resolutions suitable for printing). Just like modern games only load the objects in player's vicinity instead of copying all of the dozens or hundreds of gigabytes into memory, document viewers had to load objects only in the area visible on screen. Change the page or zoom level, and wait until everything reloads from disk once again. Web browsers, for example, handle web pages of any length in the same fashion. I should also remind you that default editing mode in Word itself in the '90s was not set to WYSIWYG for similar performance reasons. If you look at the PDF object tree, you can see that some properties are set on the level above the data object, and that allows overwriting the small part of the index with the next version to change, say, position without ever touching the chunk in which the big data itself stays (because appending the new version of that chunk, while possible, would increase the file size much more).

Document redraw speed can be seen in this random video. But that's 1999, and they probably got a really well performing system to record the promotional content. https://www.youtube.com/watch?v=Pv6fZnQ_ExU

PDF is a terrible format not because of that, but because its “standard” retroactively defined everything from the point of view of Acrobat developer, and skipped all the corner cases and ramifications (because if you are an Acrobat developer, you define what is a corner case, and what is not). As a consequence, unless you are in a closed environment you control, the only practical validator for arbitrary PDFs is Acrobat (I don't think that happened by chance). The external client is always going to say “But it looks just fine on my screen”.

0 comments

ogurechny

No comments yet

Contribute on Hacker News ↗