Comment by jncfhnb

1 year ago

I don’t think I believe that OCR can’t do it but random humans can

OCR is VERY good

54 comments

jncfhnb

Actually I think in 2025 you are correct, we just haven’t got the best tech into the OCR software that’s out there in the real world. I just pasted the letter from the article into ChatGPT (4o) and asked “what does this old letter say?” The response:

—-

The following is the declaration of James Lambert, a soldier of the Revolutionary War in North America.

The said James Lambert on this day personally appeared in the Probate Court of the County of Dearborn in the State of Indiana and at the November Term of said Court (1841), it being a court of record established by the laws of Indiana and made oath that:

On the 25th day of March 1842 he will be eighty-five years old; that he was born in the State of Maryland; that he is now a resident of said county and has been for the 27 years last past; that he has lived in Virginia, Maryland, Pennsylvania…

—-

ozbonus 1 year ago

I've been trying every state of the art OCR solution on my students' handwritten essays for fifteen years and have yet to find anything even close to acceptable.

wriggler 1 year ago

I'm the founder of handwritingocr.com - have you checked out our free trial? We have loads of educators using our service for exactly this, and they seem quite happy with it.
jncfhnb 1 year ago

What methods have you tried?

AdieuToLogic 1 year ago

> I don’t think I believe that OCR can’t do it but random humans can

Considering the people involved are experts in their field, are certainly aware of OCR capabilities, and have publicized a need thusly:

  ... the National Archives is looking for volunteers who can 
  help transcribe and organize its many handwritten records ...

Perhaps "random humans" can perform tasks which could reshape your belief:

> OCR is VERY good

tptacek 1 year ago
No. Sign up and look at the current missions. A lot of what they want transcribed is totally straightforward to OCR --- not even LLM, OCR. Whatever's going on, and I'm not second-guessing them, a pretty big chunk of their problem appears to be well within the state of the art. The appeal to authority isn't going to play here, because you can just click through to the archives and see what they're trying to figure out.
- AdieuToLogic 1 year ago
  
  > No. Sign up and look at the current missions. A lot of what they want transcribed is totally straightforward to OCR --- not even LLM, OCR. Whatever's going on, and I'm not second-guessing them, a pretty big chunk of their problem appears to be well within the state of the art.
  If it's that easy, then do it and be the hero they want.
  Or maybe, just maybe, "a pretty big chunk of their problem appears to be well within the state of the art" is a sweeping generalization lacking understanding of the difficulties involved.
  
  13 replies →
jncfhnb 1 year ago
Also, you seem to have taken issue with the phrase “random humans” because you’re confused at what’s being done here. It is random humans. Non experts.
Experts are asking for the help of non experts.
> Anyone with an internet connection can volunteer to transcribe historical documents and help make the archives’ digital catalog more accessible
- AdieuToLogic 1 year ago
  
  > Also, you seem to have taken issue with the phrase “random humans” because you’re confused at what’s being done here. It is random humans. Non experts.
  I'm largely aligned with your interpretation of "random humans", with a clarification below. The experts I was referencing are the ones you identified:
  > Experts are asking for the help of non experts.
  The call to action by the archivists (experts), IMHO, has the intent to engage people with interest in the topic. So not really random from a mathematical definition, but perhaps better thought of as "unknown interested parties."
  Granted, this is my unsubstantiated opinion.
jncfhnb 1 year ago
There are conceivable reasons why they may be telling a half truth here. Just engaging the public is a worthy goal here.
- AdieuToLogic 1 year ago
  
  > There are conceivable reasons why they may be telling a half truth here. Just engaging the public is a worthy goal here.
  Asserting an ulterior motive without supporting proof is to engage in conspiracy theories.
  Sometimes a cigar is just a cigar.[0]
  0 - https://quoteinvestigator.com/2011/08/12/just-a-cigar/
  
  3 replies →

BugsJustFindMe 1 year ago

> I don’t think I believe that OCR can’t do it but random humans can

I do.

> OCR is VERY good

Uh, my experience is extremely different.

jncfhnb 1 year ago
I would challenge you to find a picture of text that you think a human can read and OCR cannot. I’m happy to demonstrate. The text shown in this article is trivial.
- demosthanos 1 year ago
  
  The archivists themselves say that they run into such texts often enough that this program was needed:
  > The agency uses artificial intelligence and a technology known as optical character recognition to extract text from historical documents. But these methods don’t always work, and they aren’t always accurate.
  They are absolutely aware of the advances in these tools, so if they say they're not completely there yet I believe them. One likely reason is that the models probably have less 1800s-era cursive in their training set than they do modern cursive.
  It's likely that with more human-tagged data they could improve on the state of the art for OCR, but it's pretty arrogant to doubt the agency in charge of this sort of thing when they say the tech isn't there yet.
  
  7 replies →
- AdieuToLogic 1 year ago
  
  > I would challenge you to find a picture of text that you think a human can read and OCR cannot.
  Are you aware of CAPTCHA[0] images?
  0 - https://en.wikipedia.org/wiki/CAPTCHA
  
  4 replies →
- BugsJustFindMe 1 year ago
  
  Yeah ok, but it might take me a few tries because I don't know what you're using. I hope that's agreeable?
  What does your OCR say that these say? The first one isn't too hard for a human (assuming appropriate language skill). The second one is a bit more difficult.
  https://imgur.com/a/CDU6Lgs
CamperBob2 1 year ago
Your experience is obsolete.
- BugsJustFindMe 1 year ago
  
  Oh, ok then.
  
  9 replies →