Comment by einpoklum
8 days ago
> Arbitrary nonsensical text require character recognition.
Are you sure? I mean, if it's printed text in a non-connected script, where characters repeat themselves (nearly) identically, then ok, but if you're looking at handwriting - couldn't one argue that it's _words_ that get recognized? And that's ignoring the question of textual context, i.e. recognizing based on what you know the rest of the sentence to be.
Handwriting with words is not arbitrary nonsensical text
Yes - my point was about identifier strings such as UUID
Not really. I have an HTR use case where the data is highly specialized codes. All the OCR software I use is tripped up by trying to find the content into the category of English words.
LLMs can help, but I’ve also had issues where the repetitive nature of the content can reliably result in terrible hallucinations.