← Back to context

Comment by throwaway81523

3 years ago

I've done stuff like this semi manually. Use pdftotext to get the text tables out of the pdf, eyeball it and massage with emacs keyboard macros, and in some cases python scripts. It's not that big a deal but it is somewhat ad hoc.

I know that OCR software is able to read stuff like magazine articles and figure out column layout, embedded charts, etc. It's weird if is nothing to do that with a pdf. Maybe I'll look around or see if I can hack up something.