Comment by asveikau

8 hours ago

OCR is very bad.

As an example look at subtitle rips for DVD and Blu-ray. The discs store them as images of rendered computer text. A popular format for rippers is SRT, where it will be stored as utf-8 and rendered by the player. So when you rip subtitles, there's an OCR step.

These are computer rendered text in a small handful of fonts. And decent OCR still chokes on it often.