Comment by BugsJustFindMe
20 hours ago
> I don’t think I believe that OCR can’t do it but random humans can
I do.
> OCR is VERY good
Uh, my experience is extremely different.
20 hours ago
> I don’t think I believe that OCR can’t do it but random humans can
I do.
> OCR is VERY good
Uh, my experience is extremely different.
I would challenge you to find a picture of text that you think a human can read and OCR cannot. I’m happy to demonstrate. The text shown in this article is trivial.
The archivists themselves say that they run into such texts often enough that this program was needed:
> The agency uses artificial intelligence and a technology known as optical character recognition to extract text from historical documents. But these methods don’t always work, and they aren’t always accurate.
They are absolutely aware of the advances in these tools, so if they say they're not completely there yet I believe them. One likely reason is that the models probably have less 1800s-era cursive in their training set than they do modern cursive.
It's likely that with more human-tagged data they could improve on the state of the art for OCR, but it's pretty arrogant to doubt the agency in charge of this sort of thing when they say the tech isn't there yet.
Can someone please post a sample of one of these images that can only be read by a human for us naive OCR believers to see?
4 replies →
Then please provide a single example that we can’t instantly solve. Happy to prove them wrong.
> I would challenge you to find a picture of text that you think a human can read and OCR cannot.
Are you aware of CAPTCHA[0] images?
0 - https://en.wikipedia.org/wiki/CAPTCHA
Text that is _intentionally constructed_ to fool computers but not humans is obviously out of scope. But they’re generally easily solved with OCR these days anyway.
Solvable with the right tools.
https://github.com/noCaptchaAi/NoCaptcha-Ai-Browser-Extensio...
2 replies →
Yeah ok, but it might take me a few tries because I don't know what you're using. I hope that's agreeable?
What does your OCR say that these say? The first one isn't too hard for a human (assuming appropriate language skill). The second one is a bit more difficult.
https://imgur.com/a/CDU6Lgs
Your experience is obsolete.
Oh, ok then.
I mean, all you have to do is feed the image to ChatGPT, and it will read it basically as well as you can.
Denying/downvoting reality is always an option, of course.
8 replies →