Comment by anaisbetts
10 hours ago
Did you actually check it? Sonnet 3.5 generates text that seems legitimate and generally correct, but misreads important details. LLMs are particularly deceptive because they will be internally consistent - they'll reuse the same incorrect name in both places and will hallucinate information that seems legit, but in fact is just made-up.
Just have version control, and allow randomized spot checks with experts to have a known error rate.
You don't use LLM but other transformer based ocr models like trocr which has very low CER and WER rates