← Back to context

Comment by sgc

1 year ago

> What would be interesting is finding which two OCR/AI systems make the most different mistakes and running documents against both. Flagging only the disagreements for human verification would reduce the task substantially.

There have been OCR products that do that for decades, and I would hope all the ocr startups are doing the same already. Often times something is objectively difficult to read and the various models will all fail in the same place, reducing the expected utility of this method. It still helps of course. I forget the name of the product, there was one that used about 5 ocr engines and would use consensus to optimize its output. It could never beat ABBYY finereader though, it was a distant second place.