← Back to context

Comment by twelve40

17 days ago

Isn't it amazing how one company invented a universally spread format that takes structured data from an editor (except images obviously) and converts it into a completely fucked-up unstructured form that then requires expensive voodoo magic to convert back into structured data.

What would it have taken to store the plain text in some meta field in the document. Argh, so annoying.

  • PDF provide that capability, but editors don't produce it, probably because printing is though OS drivers that don't support it, or PDF generators that don't support it. Or they do support it but users don't know to check that option, or turn it off because it makes PDFs too large.

  • PDF supports that just fine. It's just that many PDF publishers choose not to use that.

    You can lead a horse to water...

PDFs began as just postscript commands stored in a file. It’s a genius hack in a way that has become a Frankenstein’s monster.

People kind of dump whatever in pdf files, so I don't think a cleaner file format would do as much as you might think.

Digital fax services will generate pdf files, for example. They're just image data dumped into a pdf. Various scanners will also do so.

is "put this glyph at coordinate (x,y)" really what you'd call "structured"?

  • It's not the structure that allows meaningful understanding.

    Something that was clearly a table now becomes a bunch of glphy's physically close to eachother vs a group of other glphys but when considered as a group is a box visually separated from another group of glphys but actually part of a table.