Actually I think in 2025 you are correct, we just haven’t got the best tech into the OCR software that’s out there in the real world. I just pasted the letter from the article into ChatGPT (4o) and asked “what does this old letter say?” The response:
—-
The following is the declaration of James Lambert, a soldier of the Revolutionary War in North America.
The said James Lambert on this day personally appeared in the Probate Court of the County of Dearborn in the State of Indiana and at the November Term of said Court (1841), it being a court of record established by the laws of Indiana and made oath that:
On the 25th day of March 1842 he will be eighty-five years old; that he was born in the State of Maryland; that he is now a resident of said county and has been for the 27 years last past; that he has lived in Virginia, Maryland, Pennsylvania…
I've been trying every state of the art OCR solution on my students' handwritten essays for fifteen years and have yet to find anything even close to acceptable.
I'm the founder of handwritingocr.com - have you checked out our free trial? We have loads of educators using our service for exactly this, and they seem quite happy with it.
No. Sign up and look at the current missions. A lot of what they want transcribed is totally straightforward to OCR --- not even LLM, OCR. Whatever's going on, and I'm not second-guessing them, a pretty big chunk of their problem appears to be well within the state of the art. The appeal to authority isn't going to play here, because you can just click through to the archives and see what they're trying to figure out.
> No. Sign up and look at the current missions. A lot of what they want transcribed is totally straightforward to OCR --- not even LLM, OCR. Whatever's going on, and I'm not second-guessing them, a pretty big chunk of their problem appears to be well within the state of the art.
If it's that easy, then do it and be the hero they want.
Or maybe, just maybe, "a pretty big chunk of their problem appears to be well within the state of the art" is a sweeping generalization lacking understanding of the difficulties involved.
Also, you seem to have taken issue with the phrase “random humans” because you’re confused at what’s being done here. It is random humans. Non experts.
Experts are asking for the help of non experts.
> Anyone with an internet connection can volunteer to transcribe historical documents and help make the archives’ digital catalog more accessible
I would challenge you to find a picture of text that you think a human can read and OCR cannot. I’m happy to demonstrate. The text shown in this article is trivial.
The archivists themselves say that they run into such texts often enough that this program was needed:
> The agency uses artificial intelligence and a technology known as optical character recognition to extract text from historical documents. But these methods don’t always work, and they aren’t always accurate.
They are absolutely aware of the advances in these tools, so if they say they're not completely there yet I believe them. One likely reason is that the models probably have less 1800s-era cursive in their training set than they do modern cursive.
It's likely that with more human-tagged data they could improve on the state of the art for OCR, but it's pretty arrogant to doubt the agency in charge of this sort of thing when they say the tech isn't there yet.
Yeah ok, but it might take me a few tries because I don't know what you're using. I hope that's agreeable?
What does your OCR say that these say? The first one isn't too hard for a human (assuming appropriate language skill). The second one is a bit more difficult.
Actually I think in 2025 you are correct, we just haven’t got the best tech into the OCR software that’s out there in the real world. I just pasted the letter from the article into ChatGPT (4o) and asked “what does this old letter say?” The response:
—-
The following is the declaration of James Lambert, a soldier of the Revolutionary War in North America.
The said James Lambert on this day personally appeared in the Probate Court of the County of Dearborn in the State of Indiana and at the November Term of said Court (1841), it being a court of record established by the laws of Indiana and made oath that:
On the 25th day of March 1842 he will be eighty-five years old; that he was born in the State of Maryland; that he is now a resident of said county and has been for the 27 years last past; that he has lived in Virginia, Maryland, Pennsylvania…
—-
I've been trying every state of the art OCR solution on my students' handwritten essays for fifteen years and have yet to find anything even close to acceptable.
I'm the founder of handwritingocr.com - have you checked out our free trial? We have loads of educators using our service for exactly this, and they seem quite happy with it.
What methods have you tried?
> I don’t think I believe that OCR can’t do it but random humans can
Considering the people involved are experts in their field, are certainly aware of OCR capabilities, and have publicized a need thusly:
Perhaps "random humans" can perform tasks which could reshape your belief:
> OCR is VERY good
No. Sign up and look at the current missions. A lot of what they want transcribed is totally straightforward to OCR --- not even LLM, OCR. Whatever's going on, and I'm not second-guessing them, a pretty big chunk of their problem appears to be well within the state of the art. The appeal to authority isn't going to play here, because you can just click through to the archives and see what they're trying to figure out.
> No. Sign up and look at the current missions. A lot of what they want transcribed is totally straightforward to OCR --- not even LLM, OCR. Whatever's going on, and I'm not second-guessing them, a pretty big chunk of their problem appears to be well within the state of the art.
If it's that easy, then do it and be the hero they want.
Or maybe, just maybe, "a pretty big chunk of their problem appears to be well within the state of the art" is a sweeping generalization lacking understanding of the difficulties involved.
9 replies →
Also, you seem to have taken issue with the phrase “random humans” because you’re confused at what’s being done here. It is random humans. Non experts.
Experts are asking for the help of non experts.
> Anyone with an internet connection can volunteer to transcribe historical documents and help make the archives’ digital catalog more accessible
There are conceivable reasons why they may be telling a half truth here. Just engaging the public is a worthy goal here.
> There are conceivable reasons why they may be telling a half truth here. Just engaging the public is a worthy goal here.
Asserting an ulterior motive without supporting proof is to engage in conspiracy theories.
Sometimes a cigar is just a cigar.[0]
0 - https://quoteinvestigator.com/2011/08/12/just-a-cigar/
2 replies →
> I don’t think I believe that OCR can’t do it but random humans can
I do.
> OCR is VERY good
Uh, my experience is extremely different.
I would challenge you to find a picture of text that you think a human can read and OCR cannot. I’m happy to demonstrate. The text shown in this article is trivial.
The archivists themselves say that they run into such texts often enough that this program was needed:
> The agency uses artificial intelligence and a technology known as optical character recognition to extract text from historical documents. But these methods don’t always work, and they aren’t always accurate.
They are absolutely aware of the advances in these tools, so if they say they're not completely there yet I believe them. One likely reason is that the models probably have less 1800s-era cursive in their training set than they do modern cursive.
It's likely that with more human-tagged data they could improve on the state of the art for OCR, but it's pretty arrogant to doubt the agency in charge of this sort of thing when they say the tech isn't there yet.
5 replies →
> I would challenge you to find a picture of text that you think a human can read and OCR cannot.
Are you aware of CAPTCHA[0] images?
0 - https://en.wikipedia.org/wiki/CAPTCHA
3 replies →
Yeah ok, but it might take me a few tries because I don't know what you're using. I hope that's agreeable?
What does your OCR say that these say? The first one isn't too hard for a human (assuming appropriate language skill). The second one is a bit more difficult.
https://imgur.com/a/CDU6Lgs
Your experience is obsolete.
Oh, ok then.
6 replies →