Comment by asveikau
6 hours ago
OCR is very bad.
As an example look at subtitle rips for DVD and Blu-ray. The discs store them as images of rendered computer text. A popular format for rippers is SRT, where it will be stored as utf-8 and rendered by the player. So when you rip subtitles, there's an OCR step.
These are computer rendered text in a small handful of fonts. And decent OCR still chokes on it often.
No comments yet
Contribute on Hacker News ↗