Comment by xyst

6 days ago

Seeing blind recommendations for AI slop is very disappointing for HN.

For OP, there is a library written in rust that can do exactly what you need with very high accuracy and performant [1].

Would need to OCR dependencies to get it to work on scanned books [2].

[1] https://github.com/yobix-ai/extractous

[2] https://github.com/yobix-ai/extractous?tab=readme-ov-file#-s...

That looks rather nice, actually. Thanks.

I especially like the approach to graalify Tika.