Comment by davedx
3 years ago
It depends. There are PDFs with rasterized images of text (like in the article, when it’s a scan or photo of a document), then there are PDFs with vector positioned text runs (when it’s usually a result of some digital process). The latter are way easier to process than the former.
No comments yet
Contribute on Hacker News ↗