Comment by mcswell

1 month ago

Obviously this was whimsical when it came out. However...we were creating synthetic data for training and testing OCR in multiple scripts. We would take a web page in some language with a non-Roman script, and reproduce it as multiple PDFs using different fonts. We also added various kinds of blurring, using ImageMagick and---of course---this very coffee stains program!

0 comments