← Back to context

Comment by dr_dshiv

10 hours ago

Instead of thinking about this as an all-or-nothing outcome, consider how this might work if they were made accessible with LLMs, and then you used randomized spot checks with experts to create a clear and public error rate. Then, when people see mistakes they can fix them.

I’m trying to do this for old Latin books at the Embassy of the Free Mind in Amsterdam. So many of the books have never been digitized, let alone OCRd or translated. There is a huge amount of work to be done to make these works accessible.

LLMs won’t make it perfect. But isn’t perfect the enemy of the good? If we make it an ongoing project where the source image material is easily accessible (unlike in a normal published translation, where you just have to trust the translator), then the knowledge and understanding can improve over time.

This approach also has the benefit of training readers not to believe everything they read — but to question it and try to get directly at the source. I think that’s a beautiful outcome.

These kinds of ideas just sound to me like "Suppose you had to use broken technology X. How do you make work?"

  • I don't think you're wrong, but that's because there are no alternative technologies. The only alternative is leaving much more of the archive inaccessible for a much longer period, possibly forever.

    • > The only alternative is leaving much more of the archive inaccessible for a much longer period, possibly forever.

      No, the alternative is volunteers transcribing. Like this project.

      Not every problem needs a computer.

      1 reply →