Slacker News Slacker News logo featuring a lazy sloth with a folded newspaper hat
  • top
  • new
  • show
  • ask
  • jobs
Library

Comment by brudgers

5 months ago

https://linux.die.net/man/1/pdftotext

is the simplest thing that might work.

It is free and mature.

2 comments

brudgers

Reply

jbaiter  5 months ago

That will not work for scanned PDFs without a text layer and even if it has one, it's not guaranteed to work.

  • brudgers  5 months ago

    "Might work" comes with neither express nor implied warranty.

    OCR is another thing that might work which is also simpler than an LLM.

Slacker News

Product

  • API Reference
  • Hacker News RSS
  • Source on GitHub

Community

  • Support Ukraine
  • Equal Justice Initiative
  • GiveWell Charities