Comment by davedx

3 years ago

It depends. There are PDFs with rasterized images of text (like in the article, when it’s a scan or photo of a document), then there are PDFs with vector positioned text runs (when it’s usually a result of some digital process). The latter are way easier to process than the former.