← Back to context

Comment by upofadown

12 days ago

The font choice is discussed here:

https://www.monperrus.net/martin/perfect-ocr-digital-data

From the linked article:

>The optimal font varies very much on the considered engine. Monospaced fonts (aka fixed-width) such as Inconsolata, are more appropriate in general. ocr-a and ocr-b ocrb give really poor results.

I noticed that they liked using lower case letters for bases where that is a choice. I would think that the larger, upper case letters would be better for OCR. Using lower case for either OCR-A or OCR-B would be a poor idea in any case. The good OCR properties are only provided for the upper case letters. The lower case letters were mostly provided for completeness.

Also, the author might be training on entire blocks of characters rather than individual characters. That isn't really want you want here unless you are using something like words for your representation. OCR-A and OCR-B were designed for character by character OCR.