Comment by RNCTX

5 months ago

Awesome.

I modified a library card software (Blacklight) into a searchable PDF industrial manual system awhile back on a one-off basis. It couldn't go any further than a contract project that delivered the source code because it's hard to do anything programmatically (at the time) to a PDF without Ghostscript.

I've often thought of rewriting it with Python (and Postgres, to get rid of Solr or Elastic as the search backend), maybe now's the time...

I trust you long enough for a second look because I ctrl-f'd the readme and found "pdfium" so I know I don't have to retread old ground in your github issues about how there's really only a couple of ways to parse a PDF with a semblance of reliability, lol...

(for anyone else reading this getting started with documents.. Adobe and Chrome are really the only PDF rendering libraries that work. PDF.js aka Firefox has always been broken, and Apple's is problematic as well, in both cases rearing their heads in terms of incorrect word / letter spacing).

0 comments

RNCTX

No comments yet

Contribute on Hacker News ↗