Comment by freedmand

2 years ago

Agreed. Tesseract is not able to handle handwriting or text that is distorted well, e.g. colored text over an image background — to the point that it would hurt any downstream LLM trying to make sense of the contents. It won’t even pick out bounding boxes.

I doubt they are running an OCR model, but if they actually were it would likely be an in-house one trained with more modern techniques.