Comment by mpfect

11 hours ago

Turns out "technical debt" also applies to national archives.

More than you can possibly imagine. There are warehouses full of unread papers. Any one of which could contain a reference to somebody or something important.

There was a recently discovered letter, possibly to Shakespeare's wife, which would completely change our understanding of their marriage, and even the way his plays depict women. The only way to find such things is by hordes of grad students trudging their way through fragile paper and messy handwriting.

  • I hate to say it, but might LLMs transform archival work? Not by replacing researchers, but by inputting everything (or orders of magnitude more than we could previously) and outputting to the researcher a prioritized list of documents / etc to examine?

    • If you could automate transcription, it would be an enormous boon to researchers.

      Reading the handwriting would be really hard, and it would be a massive effort to move all that paper. Just handling it is hard; it's not like flipping through mass-manufactured books.

      But I suspect that you could spend a few million dollars to revolutionize the field.

      1 reply →

    • Assuming they have been transcribed, yes. The key idea that makes LLMs special is the attention mechanism. Maintaining attention over volumes of data is boring for most humans.

      Also, to be pedantic, just taking about LLMs in this context is a tad reductive. There are many deep learning models involved in archival work that aren't language models.

      I encourage you to read into this post for more context on what I mean: https://news.ycombinator.com/item?id=48675179

    • I had ChatGPT translate some old, handwritten French legal documents for family history purposes. It was far more accurate than I expected.

      At scale, with better models, we might have a way to clear out the old archives. Not only could you translate, you could ask it to triage the discoveries. "Would the average person find this noteworthy?"