Comment by rapjul
16 hours ago
Docling works quite well for me to convert a scanned book PDF to Markdown text.
On the command line, first install `uv` from https://github.com/astral-sh/uv?tab=readme-ov-file#installat..., then run `uv tool install -U "docling[tesserocr,ocrmac,vlm]"` (first includes the tesserocr, ocrmac (macOS only), and vlm (for running a small Image-to-Text model to get descriptions of images).
You go here https://github.com/DS4SD/docling/blob/main/pyproject.toml#L1... to see all the extra installation options.
For cached/offline use, run `docling-tools models download` to download their models.
No comments yet
Contribute on Hacker News ↗