Comment by xyst
6 days ago
Seeing blind recommendations for AI slop is very disappointing for HN.
For OP, there is a library written in rust that can do exactly what you need with very high accuracy and performant [1].
Would need to OCR dependencies to get it to work on scanned books [2].
[1] https://github.com/yobix-ai/extractous
[2] https://github.com/yobix-ai/extractous?tab=readme-ov-file#-s...
That looks rather nice, actually. Thanks.
I especially like the approach to graalify Tika.