← Back to context

Comment by oedemis

17 days ago

there is also https://ds4sd.github.io/docling/ from ibm research which is mit license and track bounding boxes as rich json format

Docling has worked well for me. It handles scenarios that crashed ChatGPT Pro. Only problem is it's super annoying to install. When I have a minute I might package it for homebrew.

  • Did you compare it to tesseract?

    If it's superior (esp. for scans with text flowing around image boxes), and if you do end up packaging it up for brew, know that there's at least one developer who will benefit from your work (for a side-project, but that goes without saying).

    Thanks in advance!