← Back to context

Comment by jfengel

4 hours ago

If you could automate transcription, it would be an enormous boon to researchers.

Reading the handwriting would be really hard, and it would be a massive effort to move all that paper. Just handling it is hard; it's not like flipping through mass-manufactured books.

But I suspect that you could spend a few million dollars to revolutionize the field.

>automate transcription

this also means trusting the LLM to decide what things mean. but there is very likely a great middle ground of having LLMs take their best guesses and then verifying the output on significant finds. the risk is in LLM understating something important, false negatives, leading to putting stuff at the bottom of the pile that appears mundane but isnt

  • That's why I suggest the output would be a prioritized list of documents for the researchers to review; the LLM doesn't get the final say, it just makes recommendations. Yes, things would be missed, but the resesarchers might in theory find much more value than their current search method.

This is already the case with genealogical sites that have ML OCR creating searchable indices of handwritten documents.