Comment by einpoklum

1 year ago

> Arbitrary nonsensical text require character recognition.

Are you sure? I mean, if it's printed text in a non-connected script, where characters repeat themselves (nearly) identically, then ok, but if you're looking at handwriting - couldn't one argue that it's _words_ that get recognized? And that's ignoring the question of textual context, i.e. recognizing based on what you know the rest of the sentence to be.

4 comments

einpoklum

WhatThisGuySaid 1 year ago

Handwriting with words is not arbitrary nonsensical text

liotier 1 year ago

Yes - my point was about identifier strings such as UUID

coredog64 1 year ago

Not really. I have an HTR use case where the data is highly specialized codes. All the OCR software I use is tripped up by trying to find the content into the category of English words.

LLMs can help, but I’ve also had issues where the repetitive nature of the content can reliably result in terrible hallucinations.