Comment by jonathanyc

2 years ago

PDF natively supports selectable/extractable text. Section 9.10 of ISO 32000 is literally “Extraction of Text Content.” I’ve implemented it myself in production software.

There are many good reasons why PDF has a “render glyph” instruction instead of a “render string”. In particular your printer and your PDF viewer should not need to have the same text shaping and layout algorithms in order for the PDF to render the same. Oops, your printer runs a different version of Harfbuzz!

The sibling comment is right that a lot depends on the software that produced the PDF. It’s important to be accurate about where the blame lies. I don’t blame the x86 ISA or the C++ standards committee when an Electron app uses too much memory.

0 comments

jonathanyc

No comments yet

Contribute on Hacker News ↗