Comment by rtkwe

2 months ago

What? I pulled one page out of the image set and tried to get GPT 4o to transcribe it. I wasn't just using the easy example from the original article, it's an easy example to draw people into the idea of participating in the volunteer effort. If it were one of the inscrutable documents people would be more likely to be put off the effort.

Did the link in my comment not take you to a single page (I just tested it in incognito mode too..)? For me it's this image [0] and no I tried just this one page and it didn't do well. If you can get it to work let me know the prompt it was late for me and it

[0] https://s3.amazonaws.com/NARAprodstorage/opastorage/live/17/...

No, if I follow the link in your comment, I get a very different image, this one: https://s3.amazonaws.com/NARAprodstorage/opastorage/live/17/...

(page 8 of "Revolutionary War Pension and Bounty Land Warrant Application File W. 7785, John Hopper, N.C.")

I agree that your description of the image the link shows you, which appears to be page 52 of the same file, makes sense. I can read ... some of the handwritten words. None of the long ones.

  • Very very strange. It's been giving me the same image for the whole time including over multiple devices. The one you link is #52 for me.

    Anyways yes that handwritten text is an example where LLMs just cant hack it and people seem to be able to. There's a pretty thorough transcript of the upper handwritten portion of the page I was referencing available from a user. It's a great example of why you can't just throw an LLM at problems like this. At best they're a tool people can use to transcribe loads of them quickly but it still needs to be hand checked for accuracy, completeness, and relevance.