Comment by ellen364

1 year ago

> Like if this is a crowdsourcing project, why not do a first pass with an LLM and present users with both the image and the best-effort LLM pass?

Possibly for the reason that came up in your other post: you mentioned that you spot checked the result.

Back when I was in historical research, and occasionally involved in transcription projects, the standard was 2-3 independent transcriptions per document.

Maybe the National Archive will pass documents to an LLM and use the output as 1 of their 2-3 transcriptions. It could reduce how many duplicate transcriptions are done by humans. But I'll be surprised if they jump to accepting spot checked LLM output anytime soon.

1 comment

ellen364

tptacek 1 year ago

You get that I'm not saying they should just commit LLM outputs as transcriptions, right?