Comment by jfengel
4 hours ago
If you could automate transcription, it would be an enormous boon to researchers.
Reading the handwriting would be really hard, and it would be a massive effort to move all that paper. Just handling it is hard; it's not like flipping through mass-manufactured books.
But I suspect that you could spend a few million dollars to revolutionize the field.
>automate transcription
this also means trusting the LLM to decide what things mean. but there is very likely a great middle ground of having LLMs take their best guesses and then verifying the output on significant finds. the risk is in LLM understating something important, false negatives, leading to putting stuff at the bottom of the pile that appears mundane but isnt
That's why I suggest the output would be a prioritized list of documents for the researchers to review; the LLM doesn't get the final say, it just makes recommendations. Yes, things would be missed, but the resesarchers might in theory find much more value than their current search method.
This is already the case with genealogical sites that have ML OCR creating searchable indices of handwritten documents.