Comment by llm_trw

9 months ago

Because no one knows how to prep the images. With the right file type and resolution I get under a single character error per 10 pages and it's been that good since the late 00s.

With handwriting? With mixed fonts? Tesseract requires heavy customization and extension to perform reasonably on these workloads. The off-the-shelf options from major cloud providers blow it out of the water.

  • Never had to use it with handwriting, mixed fonts and text where location carries semantic infirmation: absolutely.

How do you prep the images?

  • May hourly rate starts at $300. If you'd like to hire me you're more than welcome to. I've done this work for a number of companies in the past.