Comment by freedmand
9 months ago
Agreed. Tesseract is not able to handle handwriting or text that is distorted well, e.g. colored text over an image background — to the point that it would hurt any downstream LLM trying to make sense of the contents. It won’t even pick out bounding boxes.
I doubt they are running an OCR model, but if they actually were it would likely be an in-house one trained with more modern techniques.
No comments yet
Contribute on Hacker News ↗