Comment by vikp
16 days ago
Docling is a great project, happy to see more people building in the space.
Marker output will be higher quality than docling output across most doc types, especially with the --use_llm flag. A few specific things we do differently:
- We have hybrid mode with gemini that merges tables across pages, improves quality on forms, etc.
- we run an ordering model, so ordering is better for docs where the PDF orde ris bad
- OCR is a lot better, we train our own model, surya - https://github.com/VikParuchuri/surya
- References and links
- Better equation conversion (soon including inline)
No comments yet
Contribute on Hacker News ↗