Comment by tptacek
20 hours ago
OK, fair enough, but can you find one in this article that's hard for an LLM? The gnarliest one I saw, 4o handled instantly, and I went back and looked carefully at the image and the text and I'm sold.
Like if this is a crowdsourcing project, why not do a first pass with an LLM and present users with both the image and the best-effort LLM pass?
Later
I signed up, went to the current missions, and they all seem to post post-1900 and all typeset. They're blurry, but 4o cuts through them like a hot knife through butter.
My parents have saved letters from their parents which are written in cursive but in two perpendicular layers. Meaning the writing goes horizontally in rows and then when they got to the end of the page it was turned 90 degrees and continued right on top of what was already there for the whole page. This was apparently to save paper and postage. It looks like an unintelligible jumble but my mother can actually decipher it. Maybe that’s what the LLMs are having trouble with?
Edit: apparently it’s called cross writing [1]
1: https://highshrink.com/2018/01/02/criss-cross-letters/
Are they having trouble? You can sign up right now and get tasks from the archive that seem trivial for 4o (by which I mean: feed a screenshot to 4o, get a transcription, and spot check it).
Did you actually check it? Sonnet 3.5 generates text that seems legitimate and generally correct, but misreads important details. LLMs are particularly deceptive because they will be internally consistent - they'll reuse the same incorrect name in both places and will hallucinate information that seems legit, but in fact is just made-up.
Just have version control, and allow randomized spot checks with experts to have a known error rate.
You don't use LLM but other transformer based ocr models like trocr which has very low CER and WER rates
> Like if this is a crowdsourcing project, why not do a first pass with an LLM and present users with both the image and the best-effort LLM pass?
Possibly for the reason that came up in your other post: you mentioned that you spot checked the result.
Back when I was in historical research, and occasionally involved in transcription projects, the standard was 2-3 independent transcriptions per document.
Maybe the National Archive will pass documents to an LLM and use the output as 1 of their 2-3 transcriptions. It could reduce how many duplicate transcriptions are done by humans. But I'll be surprised if they jump to accepting spot checked LLM output anytime soon.
You get that I'm not saying they should just commit LLM outputs as transcriptions, right?
My guess is because it’s the Smithsonian, they’re just not willing to trust an LLM’s transcription enough to put their name on it. I imagine they’re rather conservative. And maybe some AI-skeptic protectionist sentiments from the professional archivists. Seems like it could change with time though.
> My guess is because it’s the Smithsonian, they’re just not willing to trust an LLM’s transcription enough to put their name on it. I imagine they’re rather conservative
I expect thats a common theme from companies like that, yet I don't think they understand the issue they think they have there.
Why not have the LLMs do as much work as possible and have humans review and put their own name on it? Do you think they need to just trust and publish the output of the LLM wholeheartedly?
I think too many people saw what a few idiot lawyers did last year and closed the book on LLM usage.
> Why not have the LLMs do as much work as possible and have humans review and put their own name on it?
That's not a good way to improve on the accuracy of the LLM. Humans reviewing work that is 95% accurate are mostly just going to rubber-stamp whatever you show them. This is equally a problem for humans reviewing the work of other humans.
What you actually want, if you're worried about accuracy, is to do the same work multiple times independently and then compare results.
The incident with the lawyers just highlighted the fundamental problem with LLMs and AI in general. They can't be trusted for anything serious. Worse, they give the apppearence of being correct, which leads humans "checkers" into complacency. Total dumpster fire.
5 replies →
The article is from The Smithsonian. The actual project is with the National Archives.
I don't know about this project, but I can easily find thousands of images that gpt-4o can't read, but a human expert can. It can do typed text excellently, antika-style cursive if it's very neat, and kurrent-style cursive never.
For straightforward reasons, I am commenting on this project, not the space of all possible projects. I did try, once, to get 4o to decode the Zodiac Killer's message. It didn't work.
I'm doing some genealogy work right now on my family's old papers covering the time period from recent years back to the late 17th century. Handwriting styles changed a lot over the centuries and individuals can definitely be identified by their personal cursive style of writing and you can see their handwriting change as they aged.
Then you have the problem that some of these ancestors not only had terrible penmanship but also spelled multi-syllabic words phonetically since they likely were barely educated kids who spent more time when they were young working on the farm or ranch instead of attending school where they would've learned how to spell correctly.
I don't know whether your LLM can handle English words spelled phonetically written in cursive by an individual who had no consistency in forming letters in the words. It is clear after reading a lot of correspondence from this person that they ignored things that didn't seem important in the moment like dotting i's or crossing t's or forming tails on g's, p's, j's, or even beginning letters consistently since they switched between cursive and block letters within a sentence, maybe while they paused to clarify their thoughts. I don't know but it is fascinating to take a walk through life with someone you'll never meet and to discover that many of the things that seemed awesome to you as a kid were also awesome to them and that their life had so many challenges that our generations will never need to endure.
Some of my people have the most beautiful flowing cursive handwriting that looks like the cursive that I was taught in grade school. Others have the most beautiful flowing cursive with custom flourishes and adornments that make their handwriting instantly recognizable and easy to read once you understand their style.
I think there are plenty of edge cases where LLMs will take a drunkard's walk through the scribble and spit out gibberish.
I'm reminded of an old joke though.
Ronald Reagan woke up one snowy Washington, DC morning and took a look out of the window to admire the new-fallen snow. He enjoys the beautiful scene laid out before him until he sees tracks in the snow below his window and a message obviously written in piss that said - "Reagan sucks".
He dispatched the Secret Service to the site where samples were taken of the affected snow and photos of the tracks of two people were made.
After an investigation he receives a call from the Secret Service agent in charge who tells him he has some good news and some bad news for him.
The good news is that they know who pissed the message. It was George HW Bush, his Vice President. The bad news is that it was Nancy's handwriting.
Real quick, how long do you think chatgpto4 has existed? How long do you think the National Archive has been archiving?
It's 4o. The crowdsourced transcription project dates back to 2012. My comment is mostly on this article.
> Like if this is a crowdsourcing project...
I'm confused by what you're asking. Are you asking me to like (upvote) your comment if this is a crowdsourcing project? Don't we already know it is a crowdsourcing project?
The use of the word “like” here could be replaced with the word “so”
“So if this is a crowdsourcing project…”
Like is serving as an indication that someone else approximately said the phrase it introduced, in a way often associated with the “Valley Girl” social dialect but regularly seen outside of it.
https://en.wikipedia.org/wiki/Like#As_a_colloquial_quotative
> The use of the word “like” here could be replaced with the word “so”
Correct, but that's not a quotative use of the word. It's a discourse particle. You want to link one subsection down, like as a discourse particle.
https://en.wikipedia.org/wiki/Like#As_a_discourse_particle,_...
One that require additional work beyond simply feeding the image into the model would be this example which is a mix of barely legible hand written cursive and easy to read typed form. [0] Initially 4o just transcribes (successfully) the bottom half of the text and has to be prompted to attempt the top half at which point it seems to at best summarize the text instead of giving a direct transcription. [1] In fact it seems to mix up some portions of the latter half of the typed text with the written text in the portion of it's "transcription" about "reduced and indigent circumstances".
[0] https://catalog.archives.gov/id/54921817?objectPage=8&object...
[1] Reproducing here since I cannot share the chat since it has user uploaded images. " The text in the top half of the image is handwritten and partially difficult to read due to its cursive style and some smudging. Here's my best transcription attempt for the top section:
...resident within four? years, swears and says that the name of the John Hopper mentioned in the foregoing declaration is the same person, and he verily believes the facts as stated in the declaration are true.
He further swears that the said John Hopper is in reduced and indigent circumstances and requires the aid of his country.
The declarant further swears he has no evidence now in his power of service, except the statement of Capt. (illegible name), as to his reduced circumstances ...
Sworn to before me, this day...
Some parts remain unclear due to the handwriting, but let me know if you'd like me to attempt further clarification on specific sections!"
> this example which is a mix of barely legible hand written cursive and easy to read typed form.
> In fact it seems to mix up some portions of the latter half of the typed text with the written text in the portion of it's "transcription" about "reduced and indigent circumstances".
What typed form? What typed text? That image is a single handwritten page, and the writing is quite clean, not "barely legible".† The file related to John Hopper appears to be 59 pages, and some of them are typed, but they're all separate images.
Are you trying to process all 59 pages at once? Why?
I should note that transcription is an excellent use of an LLM in the sense of a language model, as opposed to an "LLM" in the sense of several different pieces of software hooked together in cryptic ways. It would be a lot more useful, for this task, to have direct access to the language model backing 4o than to have access to a chatbot prompt that intermediates between you and the model.
† My biggest problems in reading the page: Cursive n and u are often identical glyphs (both written и), leading me to read "Ind." as "Jud."; and I had trouble with the "roster" at the bottom of the page. What felt weirdest about that was that the crossbar of the "t" is positioned well above the top of the stem, but that can't actually be what tripped me up, because on further review it's a common feature of the author's handwriting that I didn't even notice until I got to the very end of the letter. It's even true in the earlier instance of "Roster" higher up on the page. So my best guess is that the "os" doesn't look right to me.
I misread 1758 as 1958, too, but hopefully (a) that kind of thing wears off as you get used to reading documents about the Revolutionary War; and (b) it's a red flag when someone who died in 1838 was born in 1958 according to a letter written in 1935.