← Back to context

Comment by Eridrus

1 year ago

5% accuracy can be worth a lot.

The price of any of these services pales in comparison to getting a human involved in any fraction of cases.

It is likely reasonable to expect the base LMs to keep getting better and for there to not be a moat on accuracy in the long term, but businesses are not just built on benchmark accuracy and have plenty of other ways to survive, even if the technology under the hood changes.

YES

>>5% accuracy can be worth a lot.

Most surprising to me about these results is the BEST error rate was over 8% errors (91.7% accuracy) and the worse was 40%.

Their method of calculating errors seems quite good:

>> Accuracy is measured by comparing the JSON output from the OCR/Extraction to the ground truth JSON. We calculate the number of JSON differences divided by the total number fields in the ground truth JSON. We believe this calculation method lines up most closely with a real world expectation of accuracy.

>> Ex: if you are tasked with extracting 31 values from a document, and make 4 mistakes, that results in an 87% accuracy.

Especially where dealing with numbers and money, having 10% of them being wrong seems unusable, often worse than doing nothing.

Having humans check the results instead of doing the transcriptions would be better, but humans are notoriously bad at maintaining vigilance doing the same task over many documents.

What would be interesting is finding which two OCR/AI systems make the most different mistakes and running documents against both. Flagging only the disagreements for human verification would reduce the task substantially.

  • > What would be interesting is finding which two OCR/AI systems make the most different mistakes and running documents against both. Flagging only the disagreements for human verification would reduce the task substantially.

    There have been OCR products that do that for decades, and I would hope all the ocr startups are doing the same already. Often times something is objectively difficult to read and the various models will all fail in the same place, reducing the expected utility of this method. It still helps of course. I forget the name of the product, there was one that used about 5 ocr engines and would use consensus to optimize its output. It could never beat ABBYY finereader though, it was a distant second place.

I think 87% to 92% accuracy really isn't much difference. You're still going to get errors to the point where the level and amount of checking you need to do isn't affected. Even at 98-99% you still have to do a lot of error checking.

But you get most of the bang for the buck for 1/10th the cost so I think overall it's far, far superior.