Comment by fiddlerwoaroof
1 year ago
For splitting double pages, this is the best tool I’ve seen: https://github.com/mbaeuerle/Briss-2.0
For the other issues, I haven’t found any single good tool, but I’ve stitched together things like unpaper, ghostscript and deskew ( https://github.com/galfar/deskew ).
Also, if you need OCR, hocr-tools and Google’s Document AI ocr API have worked really well for me (I tried Gemini, but you run into issues with big documents).
No comments yet
Contribute on Hacker News ↗