← Back to context

Comment by tptacek

15 hours ago

Isn't this like a bread-and-butter AI task?

“The following is the declaration of James Lambert, a soldier of the Revolutionary War in North America.” “The said James Lambert, on this day personally appeared in the Probate Court of the County of Dearborn in the State of Indiana, at the November Term of said Court [1841], it being a court of record created by the laws of Indiana, and made oath that on the 25th day of March 1842 he will be eighty‐five years old; that he was born in the State of Maryland; that he is now a resident of [said] county and has been for the [27] years last past; that he has lived in Virginia, Maryland, [and Pennsylvania]; that…”

These kinds of problems, matching up cursive to actual text, would seem to play to the absolute best strengths of an LLM, given how much basic language structure the models encode.

> The agency uses artificial intelligence and a technology known as optical character recognition to extract text from historical documents. But these methods don’t always work, and they aren’t always accurate.

I've seen people do that, and the results are.. just sad. These modern models insert their twitter-era "what grabs attention must be true" view into the very little authentic past we still possess.