Comment by mbb70
7 months ago
This is while pretty much all software that extracts structured data from PDFs throws away the text and just OCRs the page. Too many tricks with layouts and fonts.
7 months ago
This is while pretty much all software that extracts structured data from PDFs throws away the text and just OCRs the page. Too many tricks with layouts and fonts.
I'm always surprised how "generate PDF from Word" turns one word into 10 different print points, all with just a single letter.
Or even straight lines in a table. The straight lines from a table boundary get hacked into pieces. You'd think one line would be the ideal presentation for a line, but who are you to judge PDF?