Comment by veidr

3 months ago

Clearly-printed text to a sequence of characters is solved, for use cases that don't require 100% accuracy.

But not for semantic document structure — recognizing that the grammatically incomplete phrase in a larger font is a heading, recognizing subheadings and bullet lists, tables, etc.

Also not for handwritten text, text inside of images (signage and so forth), or damaged source material (old photocopies and scans created in the old days).

Those areas all seem to me where an LLM-based approach could narrow the gap between machine recognition and humans. You have to sort of reason about it from the context as a human to figure it out, too.