Comment by michaelt

1 year ago

> Also, what's strange is there's no free of paid OCR engine is added to the mix for the evaluation.

The article says they evaluated "Traditional OCR providers (Azure, AWS Textract, Google Document AI, etc.)"

Are those not paid OCR engines?

1 comment

michaelt

You're absolutely correct. I read the article quite fast, and assumed they are AI, albeit not LLM powered systems as well.

I'm using computers since I can read, and when somebody says "traditional OCR", I think about the older systems like Tessaract or ABBYY's FineReader which can be again automated for batch processing, albeit mostly locally.

Sending huge amount of PDFs to a cloud server to get them processed is still a bit alien to me, since it can be done on-premises (or on a VPS with the said software) very efficiently from my perspective.