Comment by zvr

12 days ago

Interesting.

I am not sure why, for character-based encodings, they used a general-purpose font (Inconsolata) rather than one that is specifically made for OCR -- and how this would have made it better.

Going further, if you only print a limited alphabet (16, 32 or 39 symbols) why not use a specialized font with only these characters? The final step is to use a bitmap "font" that simply shows different character values as different bit patterns.

2 comments

zvr

upofadown 12 days ago

The font choice is discussed here:

https://www.monperrus.net/martin/perfect-ocr-digital-data

From the linked article:

>The optimal font varies very much on the considered engine. Monospaced fonts (aka fixed-width) such as Inconsolata, are more appropriate in general. ocr-a and ocr-b ocrb give really poor results.

I noticed that they liked using lower case letters for bases where that is a choice. I would think that the larger, upper case letters would be better for OCR. Using lower case for either OCR-A or OCR-B would be a poor idea in any case. The good OCR properties are only provided for the upper case letters. The lower case letters were mostly provided for completeness.

Also, the author might be training on entire blocks of characters rather than individual characters. That isn't really want you want here unless you are using something like words for your representation. OCR-A and OCR-B were designed for character by character OCR.

numpad0 12 days ago

Is there like, "from pynebraskaguyocr import decodeocra" just out there? I haven't seen any of those for some reason.